Bookwise-AI is an end-to-end, intelligent book recommendation platform that leverages natural language understanding to deliver highly relevant suggestions. Users can simply describe the type of book theyβre interested in using free-form, conversational language, and the system interprets this input using advanced Sentence-BERT embeddings to semantically match it with enriched book descriptions. The project combines a responsive and interactive frontend built with Streamlit, a robust data engineering workflow orchestrated through Apache Airflow, and scalable infrastructure management using Terraform, making it both modular and production-ready.
Access the deployed app: Bookwise-AI on Streamlit Cloud
- Natural language input for book recommendations.
- Semantic similarity matching using
all-MiniLM-L6-v2
from SentenceTransformers. - Book metadata enriched via Google Books API.
- Feedback system for user interaction.
- Automated data ingestion using Apache Airflow DAGs.
- Infrastructure-as-Code with Terraform.
- Deployment on Streamlit Cloud with secure secrets management.
- Python β Core programming language for all components.
- Pandas β Data manipulation and analysis.
- NumPy β Numerical computations.
- Apache Spark β Distributed data processing for scalable transformations.
- Sentence-BERT (all-MiniLM-L6-v2) β Semantic text embeddings for recommendation.
- Apache Airflow β Workflow orchestration for automated data pipelines.
- Docker β Containerization for local Airflow deployment and potential app deployment.
- Google Cloud Platform (GCP) β Used for storing enriched datasets and logging user feedback via Google Sheets.
- Terraform β Infrastructure-as-Code for provisioning GCP resources.
- Streamlit β Interactive web app interface for user input, recommendations, and feedback.
- Dataset: Based on GoodBooks-10K (Kaggle).
- Cleaning: Removed null values, standardized language codes.
- Enrichment: Book descriptions added using Google Books API.
- Embedding: Used Sentence-BERT to convert descriptions into vector representations.
- User input is embedded using the same model.
- Cosine similarity is used to match user input with book embeddings.
- Top-N books are recommended based on similarity score.
- Simple UI to input preferences and display results.
- Adjustable number of recommendations via slider.
- Displays book title, author, rating, image, and description.
- Feedback radio buttons (Yes/No).
- External link to Google Search for more details.
- User queries and feedback are logged to a Google Sheet using a GCP service account.
- Secure secrets management is handled directly through Streamlit Cloud.
bookwise-ai/
βββ airflow/ # Airflow DAGs and logs
β βββ dags/ # Python upload DAGs
β βββ data/ # Processed data files
β βββ docker/ # Docker Compose setup for Airflow
β βββ logs/ # DAG logs
β βββ plugins/ # (optional) custom Airflow plugins
βββ authentication/ # API keys and service account credentials (not used in deployment)
βββ data/ # Cleaned and enriched datasets
βββ infrastructure/ # Terraform files for infrastructure provisioning
βββ notebooks/ # Jupyter notebooks for data exploration
βββ streamlit/ # Streamlit app files
β βββ app.py # Streamlit app entry point
β βββ Dockerfile # Optional Dockerfile for local deployment
β βββ requirements.txt # Python dependencies
β βββ embeddings.npy # Precomputed embeddings
β βββ enriched_data.csv # Final enriched dataset
β βββ all-MiniLM-L6-v2/ # Local version of the embedding model
βββ LICENSE
βββ README.md
git clone https://github.com/your-username/bookwise-ai.git
cd bookwise-ai
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
pip install -r streamlit/requirements.txt
streamlit run streamlit/app.py
Note: Feedback logging will only work if credentials are properly set up in Streamlit Cloud.
Secrets (such as Google Sheets API credentials) are managed via the Streamlit Cloud UI:
- Go to your app's dashboard on Streamlit Cloud.
- Click the gear icon next to your app name.
- Select the Secrets tab.
- Add your service account credentials in TOML format:
[gcp_service_account]
type = "service_account"
project_id = "your-project-id"
private_key_id = "..."
private_key = "..."
client_email = "..."
client_id = "..."
No need to include a .streamlit/secrets.toml
file in your repository.
The Airflow DAGs automate the upload of data files to GCP or local storage.
upload_raw_data
upload_cleaned_raw_data
upload_enriched_data
upload_embeddings
cd airflow/docker
docker-compose up
Visit http://localhost:8080
to access the Airflow UI.
The infrastructure/
directory contains Terraform configuration to provision:
- GCS buckets
- Service accounts
- IAM roles
cd infrastructure
terraform init
terraform apply
Ensure you are authenticated with Google Cloud CLI before running Terraform.
- Replace sklearn similarity with FAISS or Annoy for large-scale vector search.
- Add genre and tag filters for better discovery.
- Implement user sessions and personalization.
- Integrate usage analytics and dashboarding.
- Dockerize the app for local or containerized cloud deployment.
This project is licensed under the MIT License.