Skip to content

Extractions, calculations and optimizations + some documentation #6

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 28 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
3b9b559
lots of stuff
egeakman May 24, 2024
a3db579
port funcs to 2024
egeakman May 25, 2024
0c50a62
update
egeakman May 25, 2024
1d106e0
oops + more readable + tell what event are we transforming
egeakman May 25, 2024
96111ab
better slug dupe check + optimize
egeakman May 25, 2024
08bcbde
add documentation
egeakman May 29, 2024
39a96e3
Update README.md
egeakman May 29, 2024
ecb1cc3
Update README.md
egeakman May 29, 2024
4276fa5
add configuration to readme
egeakman May 29, 2024
aba49d6
Use model_dump_json to be able to serialize datetime
egeakman May 29, 2024
4a0d477
Merge branch 'main' into port-to-2024
egeakman May 31, 2024
4e433ec
.env + documentation + extract more socials
egeakman May 31, 2024
fcceb66
exist_ok
egeakman Jun 1, 2024
b666971
url extraction functions
egeakman Jun 1, 2024
5798b4b
Tried to put timings under a different model
egeakman Jun 2, 2024
7818471
correct typing at some places
egeakman Jun 2, 2024
84d3387
better overall structure
egeakman Jun 2, 2024
339ba50
typing
egeakman Jun 2, 2024
df0ad5f
Add resources to the schema
egeakman Jun 2, 2024
f5e635f
Update README.md
egeakman Jun 2, 2024
66fa79f
oops missed this one
egeakman Jun 2, 2024
ee3f018
change gitx_url to gitx
egeakman Jun 2, 2024
96eb614
Add tests for mastodon and linkedin url extraction
NMertsch Jun 3, 2024
1dec5c8
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jun 3, 2024
ce1de63
better code structure
egeakman Jun 4, 2024
de3f67d
Separate files
egeakman Jun 4, 2024
d875052
naming
egeakman Jun 4, 2024
42aba10
speaker website_url
egeakman Jun 4, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 0 additions & 1 deletion .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,6 @@ repos:
hooks:
- id: ruff
- id: ruff-format
args: ["--check"]

- repo: local
hooks:
Expand Down
5 changes: 4 additions & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -14,8 +14,11 @@ download:
python -m src.download

transform:
ifeq ($(WARN_DUPES), true)
python -m src.transform --warn-dupes
else
python -m src.transform

endif

all: download transform

Expand Down
41 changes: 40 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,2 +1,41 @@
# programapi
Program API

This project downloads, processes, saves, and serves the static JSON files containing details of accepted speakers and submissions via an API.

Used by the EuroPython 2024 website and the Discord bot.

**What this project does step-by-step:**

1. Downloads the Pretalx speaker and submission data, and saves it as JSON files.
2. Transforms the JSON files into a format that is easier to work with and OK to serve publicly. This includes removing unnecessary/private fields, and adding new fields.
3. Serves the JSON files via an API.

## Installation

1. Clone the repository.
2. Install the dependency management tool: ``make deps/pre``
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe an idea for the future. What about we just install programapi as a Python package? And have a following functionality:

  • programapi download
  • programapi transform
    By the way do you think that we can avoid saving unnecessary / private fields when downloading?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about it and didn't want to do it when there was bunch of other stuff to implement first. But I support it, I think we can do it after this PR.

About the private/unused fields:

  • IINM some fields like email, do_not_record cannot be excluded when downloading.
  • Answers can probably be excluded, but we do ?questions=all to get all the answers, so we can manage what to include/exclude in the model ad-hoc.

3. Install the dependencies: ``make deps/install``
4. Set up ``pre-commit``: ``make pre-commit``

## Configuration

You can change the event in the [``config.py``](src/config.py) file. It is set to ``europython-2024`` right now.

## Usage

- Run the whole process: ``make all``
- Run only the download process: ``make download``
- Run only the transformation process: ``make transform``

**Note:** Don't forget to set ``PRETALX_TOKEN`` in your ``.env`` file at the root of the project. And please don't make too many requests to the Pretalx API, it might get angry 🤪

## API

The API is served at ``https://programapi24.europython.eu/2024``. It has two endpoints (for now):

- ``/speakers.json``: Returns the list of confirmed speakers.
- ``/sessions.json``: Returns the list of confirmed sessions.

## Schema

See [this page](data/examples/README.md) for the explanations of the fields in the returned JSON files.
3 changes: 3 additions & 0 deletions data/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# JSON files except the ones in examples/
*.json
!examples/**
139 changes: 139 additions & 0 deletions data/examples/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,139 @@
# Explaining the output data

**Note:** Some of the fields may be `null` or empty (`""`).

## `sessions.json`

<details>
<summary>Example session data JSON</summary>

```json
{
"A1B2C3": {
"code": "A1B2C3",
"title": "Example talk",
"speakers": [
"B4D5E6",
...
],
"session_type": "Talk",
"slug": "example-talk",
"track": "Some Track",
"state": "confirmed",
"abstract": "This is an example talk. It is a great talk.",
"tweet": "This is an example talk.",
"duration": "60",
"level": "intermediate",
"delivery": "in-person",
"resources": [
{
"resource": "https://example.com/notebook.ipynb",
"description": "Notebook used in the talk"
},
{
"resource": "https://www.youtube.com/watch?v=dQw4w9WgXcQ",
"description": "Video of the robot in action"
}
...
],
"room": "South Hall 2A",
"start": "2024-07-10T14:00:00+02:00",
"end": "2024-07-10T15:00:00+02:00",
"website_url": "https://ep2024.europython.eu/session/example-talk/",
"sessions_in_parallel": [
"F7G8H9",
...
],
"sessions_after": [
"I0J1K2",
...
],
"sessions_before": [
"L3M4N5",
...
],
"next_session": "O6P7Q8",
"prev_session": "R9S0T1"
},
}
```
</details>

&nbsp;

The fields are as follows:

| Key | Type | Notes |
|------------------------|-------------------------------------------|---------------------------------------------------------------|
| `code` | `string` | Unique identifier for the session |
| `title` | `string` | Title of the session |
| `speakers` | `array[string]` | List of codes of the speakers |
| `session_type` | `string` | Type of the session (e.g. Talk, Workshop, Poster, etc.) |
| `slug` | `string` | URL-friendly version of the title |
| `track` | `string` \| `null` | Track of the session (e.g. PyData, Web, etc.) |
| `abstract` | `string` | Abstract of the session |
| `tweet` | `string` | Tweet-length description of the session |
| `duration` | `string` | Duration of the session in minutes |
| `level` | `string` | Level of the session (e.g. beginner, intermediate, advanced) |
| `delivery` | `string` | Delivery mode of the session (e.g. in-person, remote) |
| `resources` | `array[object[string, string]]` \| `null` | List of resources for the session: `{"resource": <url>, "description": <description>}` |
| `room` | `string` \| `null` | Room where the session will be held |
| `start` | `string (datetime ISO format)` \| `null` | Start time of the session |
| `end` | `string (datetime ISO format)` \| `null` | End time of the session |
| `website_url` | `string` | URL of the session on the conference website |
| `sessions_in_parallel` | `array[string]` \| `null` | List of codes of sessions happening in parallel |
| `sessions_after` | `array[string]` \| `null` | List of codes of sessions happening after this session |
| `sessions_before` | `array[string]` \| `null` | List of codes of sessions happening before this session |
| `next_session` | `string` \| `null` | Code of the next session in the same room |
| `prev_session` | `string` \| `null` | Code of the previous session in the same room |

&nbsp;

## `speakers.json`

<details>
<summary>Example speaker data JSON</summary>

```json
{
"B4D5E6": {
"code": "B4D5E6",
"name": "A Speaker",
"biography": "Some bio",
"avatar": "https://pretalx.com/media/avatars/picture.jpg",
"slug": "a-speaker",
"submissions": [
"A1B2C3",
...
],
"affiliation": "A Company",
"homepage": "https://example.com",
"gitx": "https://github.com/B4D5E6",
"linkedin_url": "https://www.linkedin.com/in/B4D5E6",
"mastodon_url": "https://mastodon.social/@B4D5E6",
"twitter_url": "https://x.com/B4D5E6"
},
...
}
```
</details>

&nbsp;

The fields are as follows:

| Key | Type | Notes |
|----------------|--------------------|-----------------------------------------------------------------------|
| `code` | `string` | Unique identifier for the speaker |
| `name` | `string` | Name of the speaker |
| `biography` | `string` \| `null` | Biography of the speaker |
| `avatar` | `string` | URL of the speaker's avatar |
| `slug` | `string` | URL-friendly version of the name |
| `submissions` | `array[string]` | List of codes of the sessions the speaker is speaking at |
| `affiliation` | `string` \| `null` | Affiliation of the speaker |
| `homepage` | `string` \| `null` | URL/text of the speaker's homepage |
| `gitx` | `string` \| `null` | URL/text of the speaker's GitHub/GitLab/etc. profile |
| `linkedin_url` | `string` \| `null` | URL of the speaker's LinkedIn profile |
| `twitter_url` | `string` \| `null` | URL of the speaker's Twitter profile |
| `mastodon_url` | `string` \| `null` | URL of the speaker's Mastodon profile |
| `website_url` | `string` | URL of the speaker's profile on the conference website |
Original file line number Diff line number Diff line change
Expand Up @@ -5,22 +5,32 @@
"speakers": [
"F3DC8A", "ZXCVBN"
],
"submission_type": "Talk (long session)",
"session_type": "Talk (long session)",
"slug": "this-is-a-test-talk-from-a-test-speaker-about-a-test-topic",
"track": "Software Engineering & Architecture",
"state": "confirmed",
"abstract": "This is the abstract of the talk, it should be about Lorem ipsum dolor sit amet, consectetur adipiscing elit. Donec condimentum viverra ante in dignissim. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Donec molestie lorem enim, id dignissim mi faucibus a. Suspendisse mollis lobortis mollis. Praesent eu lorem id velit maximus blandit eget at nisl. Quisque fringilla pharetra euismod. Morbi id ante vitae tortor volutpat interdum fermentum id tortor. Vivamus ligula nisl, mattis molestie purus vel, interdum venenatis nulla. Nam suscipit scelerisque ornare. Ut consequat sem vel sapien porta pretium. Nullam non lacinia nulla, a tincidunt dui. Sed consequat nibh in nibh ornare, rhoncus sollicitudin sem lobortis. Etiam molestie est et felis sollicitudin, commodo facilisis mi vehicula. Quisque pharetra consectetur ligula, sit amet tincidunt nibh consectetur fringilla. Suspendisse eu libero sed magna malesuada bibendum sed et enim. Phasellus convallis tortor nec lectus venenatis, id tristique quam finibus.",
"tweet": "This is a short version of this talk, as a tweet.",
"duration": "45",
"level": "intermediate",
"delivery": "in-person",
"resources": [
{
"resource": "https://example.com/notebook.ipynb",
"description": "Notebook used in the talk"
},
{
"resource": "https://www.youtube.com/watch?v=dQw4w9WgXcQ",
"description": "Video of the robot in action"
}
],
"room": null,
"start": null,
"end": null,
"talks_in_parallel": null,
"talks_after": null,
"next_talk_code": null,
"prev_talk_code": null,
"sessions_in_parallel": null,
"sessions_after": null,
"sessions_before": null,
"next_session": null,
"prev_session": null,
"website_url": "https://ep2024.europython.eu/session/this-is-a-test-talk-from-a-test-speaker-about-a-test-topic"
},
"B8CD4F": {
Expand All @@ -29,22 +39,23 @@
"speakers": [
"G3DC8A"
],
"submission_type": "Talk",
"session_type": "Talk",
"slug": "a-talk-with-shorter-title",
"track": "PyData: LLMs",
"state": "confirmed",
"abstract": "This is the abstract of the shoerter talk, it should be about Lorem ipsum dolor sit amet",
"abstract": "This is the abstract of the shorter talk, it should be about Lorem ipsum dolor sit amet",
"tweet": "Hey, short tweet",
"duration": "30",
"level": "beginner",
"delivery": "in-person",
"resources": null,
"room": null,
"start": null,
"end": null,
"talks_in_parallel": null,
"talks_after": null,
"next_talk_code": null,
"prev_talk_code": null,
"sessions_in_parallel": null,
"sessions_after": null,
"sessions_before": null,
"next_session": null,
"prev_session": null,
"website_url": "https://ep2024.europython.eu/session/a-talk-with-shorter-title"
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,10 @@
"submissions": ["A8CD3F"],
"affiliation": "A Company",
"homepage": null,
"twitter": null,
"mastodon": null
"gitx": "https://github.com/F3DC8A",
"linkedin_url": "https://www.linkedin.com/in/F3DC8A",
"mastodon_url": null,
"twitter_url": null,
"website_url": "https://ep2024.europython.eu/speaker/a-speaker"
}
}
2 changes: 1 addition & 1 deletion data/examples/pretalx/speakers.json
Original file line number Diff line number Diff line change
Expand Up @@ -83,7 +83,7 @@
"en": "Social (LinkedIn)"
}
},
"answer": "https://www.linkedin.com/in/F3DC8A/",
"answer": "https://www.linkedin.com/in/F3DC8A",
"answer_file": null,
"submission": null,
"review": null,
Expand Down
26 changes: 24 additions & 2 deletions data/examples/pretalx/submissions.json
Original file line number Diff line number Diff line change
Expand Up @@ -28,13 +28,22 @@
"abstract": "This is the abstract of the talk, it should be about Lorem ipsum dolor sit amet, consectetur adipiscing elit. Donec condimentum viverra ante in dignissim. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Donec molestie lorem enim, id dignissim mi faucibus a. Suspendisse mollis lobortis mollis. Praesent eu lorem id velit maximus blandit eget at nisl. Quisque fringilla pharetra euismod. Morbi id ante vitae tortor volutpat interdum fermentum id tortor. Vivamus ligula nisl, mattis molestie purus vel, interdum venenatis nulla. Nam suscipit scelerisque ornare. Ut consequat sem vel sapien porta pretium. Nullam non lacinia nulla, a tincidunt dui. Sed consequat nibh in nibh ornare, rhoncus sollicitudin sem lobortis. Etiam molestie est et felis sollicitudin, commodo facilisis mi vehicula. Quisque pharetra consectetur ligula, sit amet tincidunt nibh consectetur fringilla. Suspendisse eu libero sed magna malesuada bibendum sed et enim. Phasellus convallis tortor nec lectus venenatis, id tristique quam finibus.",
"description": null,
"duration": 45,
"resources": [
{
"resource": "https://example.com/notebook.ipynb",
"description": "Notebook used in the talk"
},
{
"resource": "https://www.youtube.com/watch?v=dQw4w9WgXcQ",
"description": "Video of the robot in action"
}
],
"slot_count": 1,
"do_not_record": false,
"is_featured": false,
"content_locale": "en",
"slot": null,
"image": null,
"resources": [],
"answers": [
{
"question": {
Expand Down Expand Up @@ -132,7 +141,7 @@
},
"track_id": 4493,
"state": "confirmed",
"abstract": "This is the abstract of the talk, it should be about Lorem ipsum dolor sit amet",
"abstract": "This is the abstract of the shorter talk, it should be about Lorem ipsum dolor sit amet",
"description": null,
"duration": 30,
"slot_count": 1,
Expand All @@ -157,6 +166,19 @@
"person": null,
"options": []
},
{
"question": {
"id": 3412,
"question": {
"en": "Abstract as a tweet / toot"
}
},
"answer": "Hey, short tweet",
"answer_file": null,
"submission": "B8CD4F",
"review": null,
"person": null
},
{
"question": {
"id": 3412,
Expand Down
4 changes: 0 additions & 4 deletions data/public/europython-2024/.gitignore

This file was deleted.

1 change: 0 additions & 1 deletion data/raw/europython-2024/.gitignore

This file was deleted.

2 changes: 2 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
[tool.isort]
profile = "black"
1 change: 1 addition & 0 deletions requirements.in
Original file line number Diff line number Diff line change
Expand Up @@ -4,5 +4,6 @@ pre-commit

requests
pydantic
python-dotenv
python-slugify
tqdm
Loading