diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
index e4315c4..313dba6 100644
--- a/.pre-commit-config.yaml
+++ b/.pre-commit-config.yaml
@@ -23,7 +23,6 @@ repos:
hooks:
- id: ruff
- id: ruff-format
- args: ["--check"]
- repo: local
hooks:
diff --git a/Makefile b/Makefile
index 62f2b4b..4cd2511 100644
--- a/Makefile
+++ b/Makefile
@@ -14,8 +14,11 @@ download:
python -m src.download
transform:
+ifeq ($(WARN_DUPES), true)
+ python -m src.transform --warn-dupes
+else
python -m src.transform
-
+endif
all: download transform
diff --git a/README.md b/README.md
index 4e26c29..9cd7e0f 100644
--- a/README.md
+++ b/README.md
@@ -1,2 +1,41 @@
# programapi
-Program API
+
+This project downloads, processes, saves, and serves the static JSON files containing details of accepted speakers and submissions via an API.
+
+Used by the EuroPython 2024 website and the Discord bot.
+
+**What this project does step-by-step:**
+
+1. Downloads the Pretalx speaker and submission data, and saves it as JSON files.
+2. Transforms the JSON files into a format that is easier to work with and OK to serve publicly. This includes removing unnecessary/private fields, and adding new fields.
+3. Serves the JSON files via an API.
+
+## Installation
+
+1. Clone the repository.
+2. Install the dependency management tool: ``make deps/pre``
+3. Install the dependencies: ``make deps/install``
+4. Set up ``pre-commit``: ``make pre-commit``
+
+## Configuration
+
+You can change the event in the [``config.py``](src/config.py) file. It is set to ``europython-2024`` right now.
+
+## Usage
+
+- Run the whole process: ``make all``
+- Run only the download process: ``make download``
+- Run only the transformation process: ``make transform``
+
+**Note:** Don't forget to set ``PRETALX_TOKEN`` in your ``.env`` file at the root of the project. And please don't make too many requests to the Pretalx API, it might get angry 🤪
+
+## API
+
+The API is served at ``https://programapi24.europython.eu/2024``. It has two endpoints (for now):
+
+- ``/speakers.json``: Returns the list of confirmed speakers.
+- ``/sessions.json``: Returns the list of confirmed sessions.
+
+## Schema
+
+See [this page](data/examples/README.md) for the explanations of the fields in the returned JSON files.
diff --git a/data/.gitignore b/data/.gitignore
new file mode 100644
index 0000000..2070759
--- /dev/null
+++ b/data/.gitignore
@@ -0,0 +1,3 @@
+# JSON files except the ones in examples/
+*.json
+!examples/**
diff --git a/data/examples/README.md b/data/examples/README.md
new file mode 100644
index 0000000..3ed1383
--- /dev/null
+++ b/data/examples/README.md
@@ -0,0 +1,139 @@
+# Explaining the output data
+
+**Note:** Some of the fields may be `null` or empty (`""`).
+
+## `sessions.json`
+
+
+Example session data JSON
+
+```json
+{
+ "A1B2C3": {
+ "code": "A1B2C3",
+ "title": "Example talk",
+ "speakers": [
+ "B4D5E6",
+ ...
+ ],
+ "session_type": "Talk",
+ "slug": "example-talk",
+ "track": "Some Track",
+ "state": "confirmed",
+ "abstract": "This is an example talk. It is a great talk.",
+ "tweet": "This is an example talk.",
+ "duration": "60",
+ "level": "intermediate",
+ "delivery": "in-person",
+ "resources": [
+ {
+ "resource": "https://example.com/notebook.ipynb",
+ "description": "Notebook used in the talk"
+ },
+ {
+ "resource": "https://www.youtube.com/watch?v=dQw4w9WgXcQ",
+ "description": "Video of the robot in action"
+ }
+ ...
+ ],
+ "room": "South Hall 2A",
+ "start": "2024-07-10T14:00:00+02:00",
+ "end": "2024-07-10T15:00:00+02:00",
+ "website_url": "https://ep2024.europython.eu/session/example-talk/",
+ "sessions_in_parallel": [
+ "F7G8H9",
+ ...
+ ],
+ "sessions_after": [
+ "I0J1K2",
+ ...
+ ],
+ "sessions_before": [
+ "L3M4N5",
+ ...
+ ],
+ "next_session": "O6P7Q8",
+ "prev_session": "R9S0T1"
+ },
+}
+```
+
+
+
+
+The fields are as follows:
+
+| Key | Type | Notes |
+|------------------------|-------------------------------------------|---------------------------------------------------------------|
+| `code` | `string` | Unique identifier for the session |
+| `title` | `string` | Title of the session |
+| `speakers` | `array[string]` | List of codes of the speakers |
+| `session_type` | `string` | Type of the session (e.g. Talk, Workshop, Poster, etc.) |
+| `slug` | `string` | URL-friendly version of the title |
+| `track` | `string` \| `null` | Track of the session (e.g. PyData, Web, etc.) |
+| `abstract` | `string` | Abstract of the session |
+| `tweet` | `string` | Tweet-length description of the session |
+| `duration` | `string` | Duration of the session in minutes |
+| `level` | `string` | Level of the session (e.g. beginner, intermediate, advanced) |
+| `delivery` | `string` | Delivery mode of the session (e.g. in-person, remote) |
+| `resources` | `array[object[string, string]]` \| `null` | List of resources for the session: `{"resource": , "description": }` |
+| `room` | `string` \| `null` | Room where the session will be held |
+| `start` | `string (datetime ISO format)` \| `null` | Start time of the session |
+| `end` | `string (datetime ISO format)` \| `null` | End time of the session |
+| `website_url` | `string` | URL of the session on the conference website |
+| `sessions_in_parallel` | `array[string]` \| `null` | List of codes of sessions happening in parallel |
+| `sessions_after` | `array[string]` \| `null` | List of codes of sessions happening after this session |
+| `sessions_before` | `array[string]` \| `null` | List of codes of sessions happening before this session |
+| `next_session` | `string` \| `null` | Code of the next session in the same room |
+| `prev_session` | `string` \| `null` | Code of the previous session in the same room |
+
+
+
+## `speakers.json`
+
+
+Example speaker data JSON
+
+```json
+{
+ "B4D5E6": {
+ "code": "B4D5E6",
+ "name": "A Speaker",
+ "biography": "Some bio",
+ "avatar": "https://pretalx.com/media/avatars/picture.jpg",
+ "slug": "a-speaker",
+ "submissions": [
+ "A1B2C3",
+ ...
+ ],
+ "affiliation": "A Company",
+ "homepage": "https://example.com",
+ "gitx": "https://github.com/B4D5E6",
+ "linkedin_url": "https://www.linkedin.com/in/B4D5E6",
+ "mastodon_url": "https://mastodon.social/@B4D5E6",
+ "twitter_url": "https://x.com/B4D5E6"
+ },
+ ...
+}
+```
+
+
+
+
+The fields are as follows:
+
+| Key | Type | Notes |
+|----------------|--------------------|-----------------------------------------------------------------------|
+| `code` | `string` | Unique identifier for the speaker |
+| `name` | `string` | Name of the speaker |
+| `biography` | `string` \| `null` | Biography of the speaker |
+| `avatar` | `string` | URL of the speaker's avatar |
+| `slug` | `string` | URL-friendly version of the name |
+| `submissions` | `array[string]` | List of codes of the sessions the speaker is speaking at |
+| `affiliation` | `string` \| `null` | Affiliation of the speaker |
+| `homepage` | `string` \| `null` | URL/text of the speaker's homepage |
+| `gitx` | `string` \| `null` | URL/text of the speaker's GitHub/GitLab/etc. profile |
+| `linkedin_url` | `string` \| `null` | URL of the speaker's LinkedIn profile |
+| `twitter_url` | `string` \| `null` | URL of the speaker's Twitter profile |
+| `mastodon_url` | `string` \| `null` | URL of the speaker's Mastodon profile |
+| `website_url` | `string` | URL of the speaker's profile on the conference website |
diff --git a/data/examples/output/sessions.json b/data/examples/europython/sessions.json
similarity index 71%
rename from data/examples/output/sessions.json
rename to data/examples/europython/sessions.json
index a5a8467..530655a 100644
--- a/data/examples/output/sessions.json
+++ b/data/examples/europython/sessions.json
@@ -5,22 +5,32 @@
"speakers": [
"F3DC8A", "ZXCVBN"
],
- "submission_type": "Talk (long session)",
+ "session_type": "Talk (long session)",
"slug": "this-is-a-test-talk-from-a-test-speaker-about-a-test-topic",
"track": "Software Engineering & Architecture",
- "state": "confirmed",
"abstract": "This is the abstract of the talk, it should be about Lorem ipsum dolor sit amet, consectetur adipiscing elit. Donec condimentum viverra ante in dignissim. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Donec molestie lorem enim, id dignissim mi faucibus a. Suspendisse mollis lobortis mollis. Praesent eu lorem id velit maximus blandit eget at nisl. Quisque fringilla pharetra euismod. Morbi id ante vitae tortor volutpat interdum fermentum id tortor. Vivamus ligula nisl, mattis molestie purus vel, interdum venenatis nulla. Nam suscipit scelerisque ornare. Ut consequat sem vel sapien porta pretium. Nullam non lacinia nulla, a tincidunt dui. Sed consequat nibh in nibh ornare, rhoncus sollicitudin sem lobortis. Etiam molestie est et felis sollicitudin, commodo facilisis mi vehicula. Quisque pharetra consectetur ligula, sit amet tincidunt nibh consectetur fringilla. Suspendisse eu libero sed magna malesuada bibendum sed et enim. Phasellus convallis tortor nec lectus venenatis, id tristique quam finibus.",
"tweet": "This is a short version of this talk, as a tweet.",
"duration": "45",
"level": "intermediate",
"delivery": "in-person",
+ "resources": [
+ {
+ "resource": "https://example.com/notebook.ipynb",
+ "description": "Notebook used in the talk"
+ },
+ {
+ "resource": "https://www.youtube.com/watch?v=dQw4w9WgXcQ",
+ "description": "Video of the robot in action"
+ }
+ ],
"room": null,
"start": null,
"end": null,
- "talks_in_parallel": null,
- "talks_after": null,
- "next_talk_code": null,
- "prev_talk_code": null,
+ "sessions_in_parallel": null,
+ "sessions_after": null,
+ "sessions_before": null,
+ "next_session": null,
+ "prev_session": null,
"website_url": "https://ep2024.europython.eu/session/this-is-a-test-talk-from-a-test-speaker-about-a-test-topic"
},
"B8CD4F": {
@@ -29,22 +39,23 @@
"speakers": [
"G3DC8A"
],
- "submission_type": "Talk",
+ "session_type": "Talk",
"slug": "a-talk-with-shorter-title",
"track": "PyData: LLMs",
- "state": "confirmed",
- "abstract": "This is the abstract of the shoerter talk, it should be about Lorem ipsum dolor sit amet",
+ "abstract": "This is the abstract of the shorter talk, it should be about Lorem ipsum dolor sit amet",
"tweet": "Hey, short tweet",
"duration": "30",
"level": "beginner",
"delivery": "in-person",
+ "resources": null,
"room": null,
"start": null,
"end": null,
- "talks_in_parallel": null,
- "talks_after": null,
- "next_talk_code": null,
- "prev_talk_code": null,
+ "sessions_in_parallel": null,
+ "sessions_after": null,
+ "sessions_before": null,
+ "next_session": null,
+ "prev_session": null,
"website_url": "https://ep2024.europython.eu/session/a-talk-with-shorter-title"
}
}
diff --git a/data/examples/output/speakers.json b/data/examples/europython/speakers.json
similarity index 54%
rename from data/examples/output/speakers.json
rename to data/examples/europython/speakers.json
index 23c45a6..178299a 100644
--- a/data/examples/output/speakers.json
+++ b/data/examples/europython/speakers.json
@@ -8,7 +8,10 @@
"submissions": ["A8CD3F"],
"affiliation": "A Company",
"homepage": null,
- "twitter": null,
- "mastodon": null
+ "gitx": "https://github.com/F3DC8A",
+ "linkedin_url": "https://www.linkedin.com/in/F3DC8A",
+ "mastodon_url": null,
+ "twitter_url": null,
+ "website_url": "https://ep2024.europython.eu/speaker/a-speaker"
}
}
diff --git a/data/examples/pretalx/speakers.json b/data/examples/pretalx/speakers.json
index 73d5917..7c961a0 100644
--- a/data/examples/pretalx/speakers.json
+++ b/data/examples/pretalx/speakers.json
@@ -83,7 +83,7 @@
"en": "Social (LinkedIn)"
}
},
- "answer": "https://www.linkedin.com/in/F3DC8A/",
+ "answer": "https://www.linkedin.com/in/F3DC8A",
"answer_file": null,
"submission": null,
"review": null,
diff --git a/data/examples/pretalx/submissions.json b/data/examples/pretalx/submissions.json
index de98184..1da2cf7 100644
--- a/data/examples/pretalx/submissions.json
+++ b/data/examples/pretalx/submissions.json
@@ -28,13 +28,22 @@
"abstract": "This is the abstract of the talk, it should be about Lorem ipsum dolor sit amet, consectetur adipiscing elit. Donec condimentum viverra ante in dignissim. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Donec molestie lorem enim, id dignissim mi faucibus a. Suspendisse mollis lobortis mollis. Praesent eu lorem id velit maximus blandit eget at nisl. Quisque fringilla pharetra euismod. Morbi id ante vitae tortor volutpat interdum fermentum id tortor. Vivamus ligula nisl, mattis molestie purus vel, interdum venenatis nulla. Nam suscipit scelerisque ornare. Ut consequat sem vel sapien porta pretium. Nullam non lacinia nulla, a tincidunt dui. Sed consequat nibh in nibh ornare, rhoncus sollicitudin sem lobortis. Etiam molestie est et felis sollicitudin, commodo facilisis mi vehicula. Quisque pharetra consectetur ligula, sit amet tincidunt nibh consectetur fringilla. Suspendisse eu libero sed magna malesuada bibendum sed et enim. Phasellus convallis tortor nec lectus venenatis, id tristique quam finibus.",
"description": null,
"duration": 45,
+ "resources": [
+ {
+ "resource": "https://example.com/notebook.ipynb",
+ "description": "Notebook used in the talk"
+ },
+ {
+ "resource": "https://www.youtube.com/watch?v=dQw4w9WgXcQ",
+ "description": "Video of the robot in action"
+ }
+ ],
"slot_count": 1,
"do_not_record": false,
"is_featured": false,
"content_locale": "en",
"slot": null,
"image": null,
- "resources": [],
"answers": [
{
"question": {
@@ -132,7 +141,7 @@
},
"track_id": 4493,
"state": "confirmed",
- "abstract": "This is the abstract of the talk, it should be about Lorem ipsum dolor sit amet",
+ "abstract": "This is the abstract of the shorter talk, it should be about Lorem ipsum dolor sit amet",
"description": null,
"duration": 30,
"slot_count": 1,
@@ -157,6 +166,19 @@
"person": null,
"options": []
},
+ {
+ "question": {
+ "id": 3412,
+ "question": {
+ "en": "Abstract as a tweet / toot"
+ }
+ },
+ "answer": "Hey, short tweet",
+ "answer_file": null,
+ "submission": "B8CD4F",
+ "review": null,
+ "person": null
+ },
{
"question": {
"id": 3412,
diff --git a/data/public/europython-2024/.gitignore b/data/public/europython-2024/.gitignore
deleted file mode 100644
index 5f55c43..0000000
--- a/data/public/europython-2024/.gitignore
+++ /dev/null
@@ -1,4 +0,0 @@
-# In this folder we have public data
-# This may in the future actually end up in this repository
-# But for now it's a bit too much noise
-*.json
diff --git a/data/raw/europython-2024/.gitignore b/data/raw/europython-2024/.gitignore
deleted file mode 100644
index a6c57f5..0000000
--- a/data/raw/europython-2024/.gitignore
+++ /dev/null
@@ -1 +0,0 @@
-*.json
diff --git a/pyproject.toml b/pyproject.toml
new file mode 100644
index 0000000..5d7bf33
--- /dev/null
+++ b/pyproject.toml
@@ -0,0 +1,2 @@
+[tool.isort]
+profile = "black"
diff --git a/requirements.in b/requirements.in
index 06779cd..14d29d5 100644
--- a/requirements.in
+++ b/requirements.in
@@ -4,5 +4,6 @@ pre-commit
requests
pydantic
+python-dotenv
python-slugify
tqdm
diff --git a/requirements.txt b/requirements.txt
index 3741855..0cc46a8 100644
--- a/requirements.txt
+++ b/requirements.txt
@@ -4,7 +4,7 @@
#
# pip-compile
#
-annotated-types==0.6.0
+annotated-types==0.7.0
# via pydantic
attrs==23.2.0
# via wmctrl
@@ -26,7 +26,7 @@ idna==3.7
# via requests
iniconfig==2.0.0
# via pytest
-nodeenv==1.8.0
+nodeenv==1.9.0
# via pre-commit
packaging==24.0
# via pytest
@@ -48,6 +48,8 @@ pyrepl==0.9.0
# via fancycompleter
pytest==8.2.1
# via -r requirements.in
+python-dotenv==1.0.1
+ # via -r requirements.in
python-slugify==8.0.4
# via -r requirements.in
pyyaml==6.0.1
@@ -58,7 +60,7 @@ text-unidecode==1.3
# via python-slugify
tqdm==4.66.4
# via -r requirements.in
-typing-extensions==4.11.0
+typing-extensions==4.12.0
# via
# pydantic
# pydantic-core
@@ -68,6 +70,3 @@ virtualenv==20.26.2
# via pre-commit
wmctrl==0.5
# via pdbpp
-
-# The following packages are considered to be unsafe in a requirements file:
-# setuptools
diff --git a/src/config.py b/src/config.py
index 9a3944f..1ee8d0b 100644
--- a/src/config.py
+++ b/src/config.py
@@ -1,6 +1,8 @@
import os
from pathlib import Path
+from dotenv import load_dotenv
+
class Config:
event = "europython-2024"
@@ -8,6 +10,12 @@ class Config:
raw_path = Path(f"{project_root}/data/raw/{event}")
public_path = Path(f"{project_root}/data/public/{event}")
- @staticmethod
- def token():
- return os.environ["PRETALX_TOKEN"]
+ @classmethod
+ def token(cls) -> str:
+ dotenv_exists = load_dotenv(cls.project_root / ".env")
+ if (token := os.getenv("PRETALX_TOKEN")) and not dotenv_exists:
+ print("Please prefer .env file to store your token! It's more secure!")
+ return token
+ elif token is None:
+ raise ValueError("Please set your token in .env file!")
+ return token
diff --git a/src/download.py b/src/download.py
index 9afd165..8027584 100644
--- a/src/download.py
+++ b/src/download.py
@@ -1,4 +1,5 @@
import json
+from typing import Any
import requests
from tqdm import tqdm
@@ -19,11 +20,13 @@
"speakers?questions=all",
]
+Config.raw_path.mkdir(parents=True, exist_ok=True)
+
for resource in resources:
url = base_url + f"{resource}"
- res0 = []
- data = {"next": url}
+ res0: list[dict[str, Any]] = []
+ data: dict[str, Any] = {"next": url}
n = 0
pbar = tqdm(desc=f"Downloading {resource}", unit=" page", dynamic_ncols=True)
diff --git a/src/misc.py b/src/misc.py
new file mode 100644
index 0000000..2aac11c
--- /dev/null
+++ b/src/misc.py
@@ -0,0 +1,26 @@
+from enum import Enum
+
+
+class SpeakerQuestion:
+ affiliation = "Company / Organization / Educational Institution"
+ homepage = "Social (Homepage)"
+ twitter = "Social (X/Twitter)"
+ mastodon = "Social (Mastodon)"
+ linkedin = "Social (LinkedIn)"
+ gitx = "Social (Github/Gitlab)"
+
+
+class SubmissionQuestion:
+ outline = "Outline"
+ tweet = "Abstract as a tweet / toot"
+ delivery = "My presentation can be delivered"
+ level = "Expected audience expertise"
+
+
+class SubmissionState(Enum):
+ accepted = "accepted"
+ confirmed = "confirmed"
+ withdrawn = "withdrawn"
+ rejected = "rejected"
+ canceled = "canceled"
+ submitted = "submitted"
diff --git a/src/models/__init__.py b/src/models/__init__.py
new file mode 100644
index 0000000..e69de29
diff --git a/src/models/europython.py b/src/models/europython.py
new file mode 100644
index 0000000..ba218de
--- /dev/null
+++ b/src/models/europython.py
@@ -0,0 +1,168 @@
+from datetime import datetime
+
+from pydantic import BaseModel, Field, computed_field, model_validator
+
+from src.config import Config
+from src.misc import SpeakerQuestion, SubmissionQuestion
+from src.models.pretalx import PretalxAnswer
+
+
+class EuroPythonSpeaker(BaseModel):
+ """
+ Model for EuroPython speaker data, transformed from Pretalx data
+ """
+
+ code: str
+ name: str
+ biography: str | None = None
+ avatar: str
+ slug: str
+ answers: list[PretalxAnswer] = Field(..., exclude=True)
+ submissions: list[str]
+
+ # Extracted
+ affiliation: str | None = None
+ homepage: str | None = None
+ twitter_url: str | None = None
+ mastodon_url: str | None = None
+ linkedin_url: str | None = None
+ gitx: str | None = None
+
+ @computed_field
+ def website_url(self) -> str:
+ return (
+ f"https://ep{Config.event.split('-')[1]}.europython.eu/speaker/{self.slug}"
+ )
+
+ @model_validator(mode="before")
+ @classmethod
+ def extract_answers(cls, values) -> dict:
+ answers = [PretalxAnswer.model_validate(ans) for ans in values["answers"]]
+
+ for answer in answers:
+ if answer.question_text == SpeakerQuestion.affiliation:
+ values["affiliation"] = answer.answer_text
+
+ if answer.question_text == SpeakerQuestion.homepage:
+ values["homepage"] = answer.answer_text
+
+ if answer.question_text == SpeakerQuestion.twitter:
+ values["twitter_url"] = cls.extract_twitter_url(
+ answer.answer_text.strip().split()[0]
+ )
+
+ if answer.question_text == SpeakerQuestion.mastodon:
+ values["mastodon_url"] = cls.extract_mastodon_url(
+ answer.answer_text.strip().split()[0]
+ )
+
+ if answer.question_text == SpeakerQuestion.linkedin:
+ values["linkedin_url"] = cls.extract_linkedin_url(
+ answer.answer_text.strip().split()[0]
+ )
+
+ if answer.question_text == SpeakerQuestion.gitx:
+ values["gitx"] = answer.answer_text.strip().split()[0]
+
+ return values
+
+ @staticmethod
+ def extract_twitter_url(text: str) -> str:
+ """
+ Extract the Twitter URL from the answer
+ """
+ if text.startswith("@"):
+ twitter_url = f"https://x.com/{text[1:]}"
+ elif not text.startswith(("https://", "http://", "www.")):
+ twitter_url = f"https://x.com/{text}"
+ else:
+ twitter_url = (
+ f"https://{text.removeprefix('https://').removeprefix('http://')}"
+ )
+
+ return twitter_url.split("?")[0]
+
+ @staticmethod
+ def extract_mastodon_url(text: str) -> str:
+ """
+ Extract the Mastodon URL from the answer, handle @username@instance format
+ """
+ if not text.startswith(("https://", "http://")) and text.count("@") == 2:
+ mastodon_url = f"https://{text.split('@')[2]}/@{text.split('@')[1]}"
+ else:
+ mastodon_url = (
+ f"https://{text.removeprefix('https://').removeprefix('http://')}"
+ )
+
+ return mastodon_url.split("?")[0]
+
+ @staticmethod
+ def extract_linkedin_url(text: str) -> str:
+ """
+ Extract the LinkedIn URL from the answer
+ """
+ if text.startswith("in/"):
+ linkedin_url = f"https://linkedin.com/{text}"
+ elif not text.startswith(("https://", "http://", "www.")):
+ linkedin_url = f"https://linkedin.com/in/{text}"
+ else:
+ linkedin_url = (
+ f"https://{text.removeprefix('https://').removeprefix('http://')}"
+ )
+
+ return linkedin_url.split("?")[0]
+
+
+class EuroPythonSession(BaseModel):
+ """
+ Model for EuroPython session data, transformed from Pretalx data
+ """
+
+ code: str
+ title: str
+ speakers: list[str]
+ session_type: str
+ slug: str
+ track: str | None = None
+ abstract: str = ""
+ tweet: str = ""
+ duration: str = ""
+ level: str = ""
+ delivery: str = ""
+ resources: list[dict[str, str]] | None = None
+ room: str | None = None
+ start: datetime | None = None
+ end: datetime | None = None
+ answers: list[PretalxAnswer] = Field(..., exclude=True)
+ sessions_in_parallel: list[str] | None = None
+ sessions_after: list[str] | None = None
+ sessions_before: list[str] | None = None
+ next_session: str | None = None
+ prev_session: str | None = None
+
+ @computed_field
+ def website_url(self) -> str:
+ return (
+ f"https://ep{Config.event.split('-')[1]}.europython.eu/session/{self.slug}"
+ )
+
+ @model_validator(mode="before")
+ @classmethod
+ def extract_answers(cls, values) -> dict:
+ answers = [PretalxAnswer.model_validate(ans) for ans in values["answers"]]
+
+ for answer in answers:
+ # TODO if we need any other questions
+ if answer.question_text == SubmissionQuestion.tweet:
+ values["tweet"] = answer.answer_text
+
+ if answer.question_text == SubmissionQuestion.delivery:
+ if "in-person" in answer.answer_text:
+ values["delivery"] = "in-person"
+ else:
+ values["delivery"] = "remote"
+
+ if answer.question_text == SubmissionQuestion.level:
+ values["level"] = answer.answer_text.lower()
+
+ return values
diff --git a/src/models/pretalx.py b/src/models/pretalx.py
new file mode 100644
index 0000000..ea19b42
--- /dev/null
+++ b/src/models/pretalx.py
@@ -0,0 +1,109 @@
+from datetime import datetime
+
+from pydantic import BaseModel, Field, field_validator, model_validator
+
+from src.misc import SubmissionState
+
+
+class PretalxAnswer(BaseModel):
+ question_text: str
+ answer_text: str
+ answer_file: str | None = None
+ submission_id: str | None = None
+ speaker_id: str | None = None
+
+ @model_validator(mode="before")
+ @classmethod
+ def extract(cls, values) -> dict:
+ values["question_text"] = values["question"]["question"]["en"]
+ values["answer_text"] = values["answer"]
+ values["answer_file"] = values["answer_file"]
+ values["submission_id"] = values["submission"]
+ values["speaker_id"] = values["person"]
+ return values
+
+
+class PretalxSlot(BaseModel):
+ room: str | None = None
+ start: datetime | None = None
+ end: datetime | None = None
+
+ @field_validator("room", mode="before")
+ @classmethod
+ def handle_localized(cls, v) -> str | None:
+ if isinstance(v, dict):
+ return v.get("en")
+ return v
+
+
+class PretalxSpeaker(BaseModel):
+ """
+ Model for Pretalx speaker data
+ """
+
+ code: str
+ name: str
+ biography: str | None = None
+ avatar: str
+ submissions: list[str]
+ answers: list[PretalxAnswer]
+
+
+class PretalxSubmission(BaseModel):
+ """
+ Model for Pretalx submission data
+ """
+
+ code: str
+ title: str
+ speakers: list[str] # We only want the code, not the full info
+ submission_type: str
+ track: str | None = None
+ state: SubmissionState
+ abstract: str = ""
+ duration: str = ""
+ resources: list[dict[str, str]] | None = None
+ answers: list[PretalxAnswer]
+ slot: PretalxSlot | None = Field(..., exclude=True)
+
+ # Extracted from slot data
+ room: str | None = None
+ start: datetime | None = None
+ end: datetime | None = None
+
+ @field_validator("submission_type", "track", mode="before")
+ @classmethod
+ def handle_localized(cls, v) -> str | None:
+ if isinstance(v, dict):
+ return v.get("en")
+ return v
+
+ @field_validator("duration", mode="before")
+ @classmethod
+ def duration_to_string(cls, v) -> str:
+ if isinstance(v, int):
+ return str(v)
+ return v
+
+ @field_validator("resources", mode="before")
+ @classmethod
+ def handle_resources(cls, v) -> list[dict[str, str]] | None:
+ return v or None
+
+ @model_validator(mode="before")
+ @classmethod
+ def process_values(cls, values) -> dict:
+ values["speakers"] = sorted([s["code"] for s in values["speakers"]])
+
+ # Set slot information
+ if values.get("slot"):
+ slot = PretalxSlot.model_validate(values["slot"])
+ values["room"] = slot.room
+ values["start"] = slot.start
+ values["end"] = slot.end
+
+ return values
+
+ @property
+ def is_publishable(self) -> bool:
+ return self.state in (SubmissionState.accepted, SubmissionState.confirmed)
diff --git a/src/transform.py b/src/transform.py
index 6c42b9e..6bbfbaf 100644
--- a/src/transform.py
+++ b/src/transform.py
@@ -1,241 +1,38 @@
-import json
-from datetime import datetime
-
-from pydantic import BaseModel, Field, model_validator
-from slugify import slugify
+import sys
from src.config import Config
+from src.utils.parse import Parse
+from src.utils.timing_relationships import TimingRelationships
+from src.utils.transform import Transform
+from src.utils.utils import Utils
+if __name__ == "__main__":
+ print(f"Parsing the data from {Config.raw_path}...")
+ pretalx_submissions = Parse.publishable_submissions(
+ Config.raw_path / "submissions_latest.json"
+ )
+ pretalx_speakers = Parse.publishable_speakers(
+ Config.raw_path / "speakers_latest.json", pretalx_submissions.keys()
+ )
-class SpeakerQuestion:
- affiliation = "Company / Organization / Educational Institution"
- homepage = "Social (Homepage)"
- twitter = "Social (X/Twitter)"
- mastodon = "Social (Mastodon)"
-
-
-class SubmissionQuestion:
- outline = "Outline"
- tweet = "Abstract as a tweet / toot"
- delivery = "My presentation can be delivered"
- level = "Expected audience expertise"
-
-
-class SubmissionState:
- accepted = "accepted"
- confirmed = "confirmed"
- withdrawn = "withdrawn"
-
-
-class PretalxAnswer(BaseModel):
- question_text: str
- answer_text: str
- answer_file: str | None
- submission_id: str | None
- speaker_id: str | None
-
- @model_validator(mode="before")
- @classmethod
- def extract(cls, values):
- values["question_text"] = values["question"]["question"]["en"]
- values["answer_text"] = values["answer"]
- values["answer_file"] = values["answer_file"]
- values["submission_id"] = values["submission"]
- values["speaker_id"] = values["person"]
- return values
-
-
-class PretalxSpeaker(BaseModel):
- code: str
- name: str
- biography: str | None
- avatar: str | None
- slug: str
- answers: list[PretalxAnswer] = Field(..., exclude=True)
- submissions: list[str]
-
- # Extracted
- affiliation: str | None = None
- homepage: str | None = None
- twitter: str | None = None
- mastodon: str | None = None
-
- @model_validator(mode="before")
- @classmethod
- def extract(cls, values):
- values["slug"] = slugify(values["name"])
-
- answers = [PretalxAnswer.model_validate(ans) for ans in values["answers"]]
-
- for answer in answers:
- if answer.question_text == SpeakerQuestion.affiliation:
- values["affiliation"] = answer.answer_text
-
- if answer.question_text == SpeakerQuestion.homepage:
- values["homepage"] = answer.answer_text
-
- # NOTE: in practice the format of the data here is different,
- # depending on the speaker. We could fix this here by parsing the
- # the answer_text to some standardised format (either @handle or
- # https://twitter.com/handle url, etc)
- if answer.question_text == SpeakerQuestion.twitter:
- values["twitter"] = answer.answer_text
-
- if answer.question_text == SpeakerQuestion.mastodon:
- values["mastodon"] = answer.answer_text
-
- return values
-
-
-class PretalxSubmission(BaseModel):
- code: str
- title: str
- speakers: list[str] # We only want the code, not the full info
- submission_type: str
- slug: str
- track: str | None
- state: str
- abstract: str
- answers: list[PretalxAnswer] = Field(..., exclude=True)
- tweet: str = ""
- duration: str
-
- level: str = ""
- delivery: str | None = ""
-
- # This is embedding a slot inside a submission for easier lookup later
- room: str | None = None
- start: datetime | None = None
- end: datetime | None = None
-
- # TODO: once we have schedule data then we can prefill those in the code here
- talks_in_parallel: list[str] | None = None
- talks_after: list[str] | None = None
- next_talk_code: str | None = None
- prev_talk_code: str | None = None
-
- website_url: str | None = None
-
- @model_validator(mode="before")
- @classmethod
- def extract(cls, values):
- # # SubmissionType and Track have localised names. For this project we
- # # only care about their english versions, so we can extract them here
- for field in ["submission_type", "track"]:
- if values[field] is None:
- continue
- else:
- # In 2024 some of those are localised, and some are not.
- # Instead of figuring out why and fixing the data, there's this
- # hack:
- if isinstance(values[field], dict):
- values[field] = values[field]["en"]
-
- values["speakers"] = sorted([s["code"] for s in values["speakers"]])
-
- answers = [PretalxAnswer.model_validate(ans) for ans in values["answers"]]
-
- for answer in answers:
- # TODO if we need any other questions
- if answer.question_text == SubmissionQuestion.tweet:
- values["tweet"] = answer.answer_text
-
- if answer.question_text == SubmissionQuestion.delivery:
- if "in-person" in answer.answer_text:
- values["delivery"] = "in-person"
- else:
- values["delivery"] = "remote"
-
- if answer.question_text == SubmissionQuestion.level:
- values["level"] = answer.answer_text.lower()
-
- # Convert duration to string for model validation
- if isinstance(values["duration"], int):
- values["duration"] = str(values["duration"])
-
- slug = slugify(values["title"])
- values["slug"] = slug
- values["website_url"] = f"https://ep2024.europython.eu/session/{slug}"
-
- return values
-
- @property
- def is_accepted(self):
- return self.state == SubmissionState.accepted
-
- @property
- def is_confirmed(self):
- return self.state == SubmissionState.confirmed
-
- @property
- def is_publishable(self):
- return self.is_accepted or self.is_confirmed
-
-
-def parse_submissions() -> list[PretalxSubmission]:
- """
- Returns only confirmed talks
- """
- with open(Config.raw_path / "submissions_latest.json") as fd:
- js = json.load(fd)
- subs = [PretalxSubmission.model_validate(item) for item in js]
- return subs
-
-
-def parse_speakers() -> list[PretalxSpeaker]:
- """
- Returns only speakers with confirmed talks
- """
- with open(Config.raw_path / "speakers_latest.json") as fd:
- js = json.load(fd)
- speakers = [PretalxSpeaker.model_validate(item) for item in js]
- return speakers
-
-
-def publishable_submissions() -> dict[str, PretalxSubmission]:
- return {s.code: s for s in parse_submissions() if s.is_publishable}
-
-
-def publishable_speakers(accepted_proposals: set[str]) -> dict[str, PretalxSpeaker]:
- sp = parse_speakers()
- output = {}
- for speaker in sp:
- accepted = set(speaker.submissions) & accepted_proposals
- if accepted:
- # Overwrite with only the accepted proposals
- speaker.submissions = list(accepted)
- output[speaker.code] = speaker
-
- return output
-
-
-def save_publishable_sessions():
- path = Config.public_path / "sessions.json"
-
- publishable = publishable_submissions()
-
- data = {k: v.model_dump() for k, v in publishable.items()}
- with open(path, "w") as fd:
- json.dump(data, fd, indent=2)
-
-
-def save_publishable_speakers():
- path = Config.public_path / "speakers.json"
-
- publishable = publishable_submissions()
- speakers = publishable_speakers(publishable.keys())
-
- data = {k: v.model_dump() for k, v in speakers.items()}
- with open(path, "w") as fd:
- json.dump(data, fd, indent=2)
-
+ print("Computing timing relationships...")
+ TimingRelationships.compute(pretalx_submissions.values())
-if __name__ == "__main__":
- print("Checking for duplicate slugs...")
- assert len(set(s.slug for s in publishable_submissions().values())) == len(
- publishable_submissions()
+ print("Transforming the data...")
+ ep_sessions = Transform.pretalx_submissions_to_europython_sessions(
+ pretalx_submissions
)
- print("Saving publishable data...")
- save_publishable_sessions()
- save_publishable_speakers()
- print("Done")
+ ep_speakers = Transform.pretalx_speakers_to_europython_speakers(pretalx_speakers)
+
+ # Warn about duplicates if the flag is set
+ if len(sys.argv) > 1 and sys.argv[1] == "--warn-dupes":
+ Utils.warn_duplicates(
+ session_attributes_to_check=["title"],
+ speaker_attributes_to_check=["name"],
+ sessions_to_check=ep_sessions,
+ speakers_to_check=ep_speakers,
+ )
+
+ print(f"Writing the data to {Config.public_path}...")
+ Utils.write_to_file(Config.public_path / "sessions.json", ep_sessions)
+ Utils.write_to_file(Config.public_path / "speakers.json", ep_speakers)
diff --git a/src/utils/__init__.py b/src/utils/__init__.py
new file mode 100644
index 0000000..e69de29
diff --git a/src/utils/parse.py b/src/utils/parse.py
new file mode 100644
index 0000000..f1cce05
--- /dev/null
+++ b/src/utils/parse.py
@@ -0,0 +1,45 @@
+import json
+from collections.abc import KeysView
+from pathlib import Path
+
+from src.models.pretalx import PretalxSpeaker, PretalxSubmission
+from src.utils.utils import Utils
+
+
+class Parse:
+ @staticmethod
+ def publishable_submissions(input_file: Path | str) -> dict[str, PretalxSubmission]:
+ """
+ Returns only publishable submissions
+ """
+ with open(input_file) as fd:
+ js = json.load(fd)
+ all_submissions = [PretalxSubmission.model_validate(s) for s in js]
+ publishable_submissions = [s for s in all_submissions if s.is_publishable]
+ publishable_submissions_by_code = {
+ s.code: s for s in publishable_submissions
+ }
+
+ return publishable_submissions_by_code
+
+ @staticmethod
+ def publishable_speakers(
+ input_file: Path | str,
+ publishable_sessions_keys: KeysView[str],
+ ) -> dict[str, PretalxSpeaker]:
+ """
+ Returns only speakers with publishable sessions
+ """
+ with open(input_file) as fd:
+ js = json.load(fd)
+ all_speakers = [PretalxSpeaker.model_validate(s) for s in js]
+ speakers_with_publishable_sessions = [
+ s
+ for s in all_speakers
+ if Utils.publishable_sessions_of_speaker(s, publishable_sessions_keys)
+ ]
+ publishable_speakers_by_code = {
+ s.code: s for s in speakers_with_publishable_sessions
+ }
+
+ return publishable_speakers_by_code
diff --git a/src/utils/timing_relationships.py b/src/utils/timing_relationships.py
new file mode 100644
index 0000000..ac6add5
--- /dev/null
+++ b/src/utils/timing_relationships.py
@@ -0,0 +1,193 @@
+from collections.abc import ValuesView
+
+from src.models.pretalx import PretalxSubmission
+
+
+class TimingRelationships:
+ all_sessions_in_parallel: dict[str, list[str]] = {}
+ all_sessions_after: dict[str, list[str]] = {}
+ all_sessions_before: dict[str, list[str]] = {}
+ all_next_session: dict[str, str | None] = {}
+ all_prev_session: dict[str, str | None] = {}
+
+ @classmethod
+ def compute(
+ cls, all_sessions: ValuesView[PretalxSubmission] | list[PretalxSubmission]
+ ) -> None:
+ for session in all_sessions:
+ if not session.start or not session.end:
+ continue
+
+ sessions_in_parallel = cls.compute_sessions_in_parallel(
+ session, all_sessions
+ )
+ sessions_after = cls.compute_sessions_after(
+ session, all_sessions, sessions_in_parallel
+ )
+ sessions_before = cls.compute_sessions_before(
+ session, all_sessions, sessions_in_parallel
+ )
+
+ cls.all_sessions_in_parallel[session.code] = sessions_in_parallel
+ cls.all_sessions_after[session.code] = sessions_after
+ cls.all_sessions_before[session.code] = sessions_before
+ cls.all_next_session[session.code] = cls.compute_prev_or_next_session(
+ session, sessions_after, all_sessions
+ )
+ cls.all_prev_session[session.code] = cls.compute_prev_or_next_session(
+ session, sessions_before, all_sessions
+ )
+
+ @classmethod
+ def get_sessions_in_parallel(
+ cls, session_code: str | None = None
+ ) -> list[str] | None:
+ return cls.all_sessions_in_parallel.get(session_code)
+
+ @classmethod
+ def get_sessions_after(cls, session_code: str | None = None) -> list[str] | None:
+ return cls.all_sessions_after.get(session_code)
+
+ @classmethod
+ def get_sessions_before(cls, session_code: str | None = None) -> list[str] | None:
+ return cls.all_sessions_before.get(session_code)
+
+ @classmethod
+ def get_next_session(cls, session_code: str | None = None) -> str | None:
+ return cls.all_next_session.get(session_code)
+
+ @classmethod
+ def get_prev_session(cls, session_code: str | None = None) -> str | None:
+ return cls.all_prev_session.get(session_code)
+
+ @staticmethod
+ def compute_sessions_in_parallel(
+ session: PretalxSubmission,
+ all_sessions: ValuesView[PretalxSubmission] | list[PretalxSubmission],
+ ) -> list[str]:
+ sessions_parallel = []
+ for other_session in all_sessions:
+ if (
+ other_session.code == session.code
+ or other_session.start is None
+ or session.start is None
+ ):
+ continue
+
+ # If they intersect, they are in parallel
+ if other_session.start < session.end and other_session.end > session.start:
+ sessions_parallel.append(other_session.code)
+
+ return sessions_parallel
+
+ @staticmethod
+ def compute_sessions_after(
+ session: PretalxSubmission,
+ all_sessions: ValuesView[PretalxSubmission] | list[PretalxSubmission],
+ sessions_in_parallel: list[str],
+ ) -> list[str]:
+ # Sort sessions based on start time, early first
+ all_sessions_sorted = sorted(
+ all_sessions, key=lambda x: (x.start is None, x.start)
+ )
+
+ # Filter out sessions
+ remaining_sessions = [
+ other_session
+ for other_session in all_sessions_sorted
+ if other_session.start is not None
+ and other_session.start >= session.end
+ and other_session.code not in sessions_in_parallel
+ and other_session.code != session.code
+ and other_session.start.day == session.start.day
+ and not other_session.submission_type
+ == session.submission_type
+ == "Announcements"
+ ]
+
+ # Add sessions to the list if they are in different rooms
+ seen_rooms = set()
+ unique_sessions: list[PretalxSubmission] = []
+
+ for other_session in remaining_sessions:
+ if other_session.room not in seen_rooms:
+ unique_sessions.append(other_session)
+ seen_rooms.add(other_session.room)
+
+ # If there is a keynote next, only show that
+ if any(s.submission_type == "Keynote" for s in unique_sessions):
+ unique_sessions = [
+ s for s in unique_sessions if s.submission_type == "Keynote"
+ ]
+
+ # Set the next sessions in all rooms
+ sessions_after = [s.code for s in unique_sessions]
+
+ return sessions_after
+
+ @staticmethod
+ def compute_sessions_before(
+ session: PretalxSubmission,
+ all_sessions: ValuesView[PretalxSubmission] | list[PretalxSubmission],
+ sessions_in_parallel: list[str],
+ ) -> list[str]:
+ # Sort sessions based on start time, late first
+ all_sessions_sorted = sorted(
+ all_sessions,
+ key=lambda x: (x.start is None, x.start),
+ reverse=True,
+ )
+
+ remaining_sessions = [
+ other_session
+ for other_session in all_sessions_sorted
+ if other_session.start is not None
+ and other_session.code not in sessions_in_parallel
+ and other_session.start <= session.start
+ and other_session.code != session.code
+ and other_session.start.day == session.start.day
+ and other_session.submission_type != "Announcements"
+ ]
+
+ seen_rooms = set()
+ unique_sessions: list[PretalxSubmission] = []
+
+ for other_session in remaining_sessions:
+ if other_session.room not in seen_rooms:
+ unique_sessions.append(other_session)
+ seen_rooms.add(other_session.room)
+
+ sessions_before = [session.code for session in unique_sessions]
+
+ return sessions_before
+
+ @staticmethod
+ def compute_prev_or_next_session(
+ session: PretalxSubmission,
+ sessions_before_or_after: list[str],
+ all_sessions: ValuesView[PretalxSubmission] | list[PretalxSubmission],
+ ) -> str | None:
+ """
+ Compute next_session or prev_session based on the given sessions_before_or_after.
+ If passed sessions_before, it will return prev_session.
+ If passed sessions_after, it will return next_session.
+
+ Returns the previous or next session in the same room or a keynote.
+ """
+ if not sessions_before_or_after:
+ return None
+
+ sessions_before_or_after_object = [
+ s for s in all_sessions if s.code in sessions_before_or_after
+ ]
+
+ session_in_same_room = None
+ for other_session in sessions_before_or_after_object:
+ if (
+ other_session.room == session.room
+ or other_session.submission_type == "Keynote"
+ ):
+ session_in_same_room = other_session.code
+ break
+
+ return session_in_same_room
diff --git a/src/utils/transform.py b/src/utils/transform.py
new file mode 100644
index 0000000..34b26bc
--- /dev/null
+++ b/src/utils/transform.py
@@ -0,0 +1,83 @@
+from src.models.europython import EuroPythonSession, EuroPythonSpeaker
+from src.models.pretalx import PretalxSpeaker, PretalxSubmission
+from src.utils.timing_relationships import TimingRelationships
+from src.utils.utils import Utils
+
+
+class Transform:
+ @staticmethod
+ def pretalx_submissions_to_europython_sessions(
+ submissions: dict[str, PretalxSubmission],
+ ) -> dict[str, EuroPythonSession]:
+ """
+ Transforms the given Pretalx submissions to EuroPython sessions
+ """
+ # Sort the submissions based on start time for deterministic slug computation
+ submissions = {
+ k: v
+ for k, v in sorted(
+ submissions.items(),
+ key=lambda item: (item[1].start is None, item[1].start),
+ )
+ }
+
+ session_code_to_slug = Utils.compute_unique_slugs_by_attribute(
+ submissions, "title"
+ )
+
+ ep_sessions = {}
+ for code, submission in submissions.items():
+ ep_session = EuroPythonSession(
+ code=submission.code,
+ title=submission.title,
+ speakers=submission.speakers,
+ session_type=submission.submission_type,
+ slug=session_code_to_slug[submission.code],
+ track=submission.track,
+ abstract=submission.abstract,
+ duration=submission.duration,
+ resources=submission.resources,
+ room=submission.room,
+ start=submission.start,
+ end=submission.end,
+ answers=submission.answers,
+ sessions_in_parallel=TimingRelationships.get_sessions_in_parallel(
+ submission.code
+ ),
+ sessions_after=TimingRelationships.get_sessions_after(submission.code),
+ sessions_before=TimingRelationships.get_sessions_before(
+ submission.code
+ ),
+ next_session=TimingRelationships.get_next_session(submission.code),
+ prev_session=TimingRelationships.get_prev_session(submission.code),
+ )
+ ep_sessions[code] = ep_session
+
+ return ep_sessions
+
+ @staticmethod
+ def pretalx_speakers_to_europython_speakers(
+ speakers: dict[str, PretalxSpeaker],
+ ) -> dict[str, EuroPythonSpeaker]:
+ """
+ Transforms the given Pretalx speakers to EuroPython speakers
+ """
+ # Sort the speakers based on code for deterministic slug computation
+ speakers = {k: v for k, v in sorted(speakers.items(), key=lambda item: item[0])}
+
+ speaker_code_to_slug = Utils.compute_unique_slugs_by_attribute(speakers, "name")
+
+ ep_speakers = {}
+ for code, speaker in speakers.items():
+ ep_speaker = EuroPythonSpeaker(
+ code=speaker.code,
+ name=speaker.name,
+ biography=speaker.biography,
+ avatar=speaker.avatar,
+ slug=speaker_code_to_slug[speaker.code],
+ answers=speaker.answers,
+ submissions=speaker.submissions,
+ )
+ ep_speakers[code] = ep_speaker
+
+ return ep_speakers
diff --git a/src/utils/utils.py b/src/utils/utils.py
new file mode 100644
index 0000000..37838cf
--- /dev/null
+++ b/src/utils/utils.py
@@ -0,0 +1,123 @@
+import json
+from collections.abc import KeysView
+from pathlib import Path
+
+from slugify import slugify
+
+from src.models.europython import EuroPythonSession, EuroPythonSpeaker
+from src.models.pretalx import PretalxSpeaker, PretalxSubmission
+
+
+class Utils:
+ @staticmethod
+ def publishable_sessions_of_speaker(
+ speaker: PretalxSpeaker, accepted_proposals: KeysView[str]
+ ) -> set[str]:
+ return set(speaker.submissions) & accepted_proposals
+
+ @staticmethod
+ def find_duplicate_attributes(
+ objects: (
+ dict[str, EuroPythonSession]
+ | dict[str, EuroPythonSpeaker]
+ | dict[str, PretalxSubmission]
+ | dict[str, PretalxSpeaker]
+ ),
+ attributes: list[str],
+ ) -> dict[str, list[str]]:
+ """
+ Find duplicates in the given objects based on the given attributes
+
+ Returns: dict[attribute_value, list[object_code]]
+ """
+ duplicates: dict[str, list[str]] = {}
+ for obj in objects.values():
+ for attribute in attributes:
+ value = getattr(obj, attribute)
+ if value in duplicates:
+ duplicates[value].append(obj.code)
+ else:
+ duplicates[value] = [obj.code]
+
+ return duplicates
+
+ @staticmethod
+ def replace_duplicate_slugs(code_to_slug: dict[str, str]) -> dict[str, str]:
+ slug_count: dict[str, int] = {}
+ seen_slugs: set[str] = set()
+
+ for code, slug in code_to_slug.items():
+ original_slug = slug
+
+ if original_slug in seen_slugs:
+ if original_slug in slug_count:
+ slug_count[original_slug] += 1
+ else:
+ slug_count[original_slug] = 1
+ code_to_slug[code] = f"{original_slug}-{slug_count[original_slug]}"
+ else:
+ seen_slugs.add(original_slug)
+
+ return code_to_slug
+
+ @staticmethod
+ def warn_duplicates(
+ session_attributes_to_check: list[str],
+ speaker_attributes_to_check: list[str],
+ sessions_to_check: dict[str, EuroPythonSession] | dict[str, PretalxSubmission],
+ speakers_to_check: dict[str, EuroPythonSpeaker] | dict[str, PretalxSpeaker],
+ ) -> None:
+ """
+ Warns about duplicate attributes in the given objects
+ """
+ print(
+ f"Checking for duplicate {'s, '.join(session_attributes_to_check)}s in sessions..."
+ )
+ duplicate_sessions = Utils.find_duplicate_attributes(
+ sessions_to_check, session_attributes_to_check
+ )
+
+ for attribute, codes in duplicate_sessions.items():
+ if len(codes) > 1:
+ print(f"Duplicate ``{attribute}`` in sessions: {codes}")
+
+ print(
+ f"Checking for duplicate {'s, '.join(speaker_attributes_to_check)}s in speakers..."
+ )
+ duplicate_speakers = Utils.find_duplicate_attributes(
+ speakers_to_check, speaker_attributes_to_check
+ )
+
+ for attribute, codes in duplicate_speakers.items():
+ if len(codes) > 1:
+ print(f"Duplicate ``{attribute}`` in speakers: {codes}")
+
+ @staticmethod
+ def compute_unique_slugs_by_attribute(
+ objects: dict[str, PretalxSubmission] | dict[str, PretalxSpeaker],
+ attribute: str,
+ ) -> dict[str, str]:
+ """
+ Compute the slugs based on the given attribute
+ and replace the duplicate slugs with incrementing
+ numbers at the end.
+
+ Returns: dict[code, slug]
+ """
+ object_code_to_slug = {}
+ for obj in objects.values():
+ object_code_to_slug[obj.code] = slugify(getattr(obj, attribute))
+
+ return Utils.replace_duplicate_slugs(object_code_to_slug)
+
+ @staticmethod
+ def write_to_file(
+ output_file: Path | str,
+ data: dict[str, EuroPythonSession] | dict[str, EuroPythonSpeaker],
+ ) -> None:
+ with open(output_file, "w") as fd:
+ json.dump(
+ {k: json.loads(v.model_dump_json()) for k, v in data.items()},
+ fd,
+ indent=2,
+ )
diff --git a/tests/test_examples_are_up_to_date.py b/tests/test_examples_are_up_to_date.py
deleted file mode 100644
index 5a5b987..0000000
--- a/tests/test_examples_are_up_to_date.py
+++ /dev/null
@@ -1,33 +0,0 @@
-import json
-
-from src.transform import PretalxSpeaker, PretalxSubmission
-
-with open("./data/examples/pretalx/submissions.json") as fd:
- pretalx_submissions = json.load(fd)
-
-with open("./data/examples/pretalx/speakers.json") as fd:
- pretalx_speakers = json.load(fd)
-
-
-def test_sessions_example():
- assert pretalx_submissions[0]["code"] == "A8CD3F"
- pretalx = pretalx_submissions[0]
-
- transformed = PretalxSubmission.model_validate(pretalx)
-
- with open("./data/examples/output/sessions.json") as fd:
- sessions = json.load(fd)
-
- assert transformed.model_dump() == sessions["A8CD3F"]
-
-
-def test_speakers_example():
- assert pretalx_speakers[0]["code"] == "F3DC8A"
- pretalx = pretalx_speakers[0]
-
- transformed = PretalxSpeaker.model_validate(pretalx)
-
- with open("./data/examples/output/speakers.json") as fd:
- speakers = json.load(fd)
-
- assert transformed.model_dump() == speakers["F3DC8A"]
diff --git a/tests/test_social_media_extraction.py b/tests/test_social_media_extraction.py
new file mode 100644
index 0000000..48e2548
--- /dev/null
+++ b/tests/test_social_media_extraction.py
@@ -0,0 +1,33 @@
+import pytest
+
+from src.models.europython import EuroPythonSpeaker
+
+
+@pytest.mark.parametrize(
+ ("input_string", "result"),
+ [
+ ("http://mastodon.social/@username", "https://mastodon.social/@username"),
+ ("https://mastodon.social/@username", "https://mastodon.social/@username"),
+ (
+ "https://mastodon.social/@username?something=true",
+ "https://mastodon.social/@username",
+ ),
+ ("@username@mastodon.social", "https://mastodon.social/@username"),
+ ],
+)
+def test_extract_mastodon_url(input_string: str, result: str) -> None:
+ assert EuroPythonSpeaker.extract_mastodon_url(input_string) == result
+
+
+@pytest.mark.parametrize(
+ ("input_string", "result"),
+ [
+ ("username", "https://linkedin.com/in/username"),
+ ("in/username", "https://linkedin.com/in/username"),
+ ("www.linkedin.com/in/username", "https://www.linkedin.com/in/username"),
+ ("http://linkedin.com/in/username", "https://linkedin.com/in/username"),
+ ("https://linkedin.com/in/username", "https://linkedin.com/in/username"),
+ ],
+)
+def test_extract_linkedin_url(input_string: str, result: str) -> None:
+ assert EuroPythonSpeaker.extract_linkedin_url(input_string) == result
diff --git a/tests/test_transform_end_to_end.py b/tests/test_transform_end_to_end.py
new file mode 100644
index 0000000..e8315db
--- /dev/null
+++ b/tests/test_transform_end_to_end.py
@@ -0,0 +1,40 @@
+import json
+
+from src.utils.parse import Parse
+from src.utils.timing_relationships import TimingRelationships
+from src.utils.transform import Transform
+
+pretalx_submissions = Parse.publishable_submissions(
+ "./data/examples/pretalx/submissions.json"
+)
+
+
+def test_e2e_sessions() -> None:
+ TimingRelationships.compute(pretalx_submissions.values())
+
+ ep_sessions = Transform.pretalx_submissions_to_europython_sessions(
+ pretalx_submissions
+ )
+ ep_sessions_dump = {
+ k: json.loads(v.model_dump_json()) for k, v in ep_sessions.items()
+ }
+
+ with open("./data/examples/europython/sessions.json") as fd:
+ ep_sessions_expected = json.load(fd)
+
+ assert ep_sessions_dump == ep_sessions_expected
+
+
+def test_e2e_speakers() -> None:
+ pretalx_speakers = Parse.publishable_speakers(
+ "./data/examples/pretalx/speakers.json", pretalx_submissions.keys()
+ )
+ ep_speakers = Transform.pretalx_speakers_to_europython_speakers(pretalx_speakers)
+ ep_speakers_dump = {
+ k: json.loads(v.model_dump_json()) for k, v in ep_speakers.items()
+ }
+
+ with open("./data/examples/europython/speakers.json") as fd:
+ ep_speakers_expected = json.load(fd)
+
+ assert ep_speakers_dump == ep_speakers_expected