Skip to content

Agentic AI workflow for Predictive Maintenance #304

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 38 commits into
base: main
Choose a base branch
from

Conversation

vikalluru
Copy link

@vikalluru vikalluru commented Jun 6, 2025

Agentic AI Workflow for Predictive Maintenance

Summary

This project implements a multi-agent predictive maintenance system for aircraft turbofan engines using the NeMo Agent Toolkit and the NASA C-MAPSS dataset. The system's design is flexible and can be adapted to any agentic workflow that involves retrieving, analyzing, and plotting time series data.

Architecture

  • React Agent: Orchestrates the main workflow using the ReAct pattern with NIM LLM integration.
  • SQL Retriever Tool: Automatically generates SQL queries using a ChromaDB vector database for schema retrieval.
  • RUL Prediction Tool: Employs an XGBoost regression model with StandardScaler preprocessing for predicting Remaining Useful Life (RUL).
  • Anomaly Detection Tool: Detects anomalies in sensor data using time series foundational models.
  • Plotting Agents: A multi-tool visualization agent that supports distribution, comparison, and time-series plots.

Salient Features

  • Utilizes a Reasoning model to generate a plan.
  • Includes an evaluation dataset with queries for tasks such as retrieval, prediction, anomaly detection, and visualization.
  • Users can customize or swap the dataset by providing the necessary documentation in the prompt.
  • A custom multimodal LLM model is used to judge generated text and plots.
  • Provides tracing options with Phoenix and RAGA AI.

Contributors: Vineeth Kalluru, Janaki Vamaraju, Sugandha Sharma, Ze Yang, Viraj Modak

vikalluru added 7 commits June 6, 2025 11:57
Signed-off-by: Vineeth Kalluru <[email protected]>
Signed-off-by: Vineeth Kalluru <[email protected]>
Signed-off-by: Vineeth Kalluru <[email protected]>
Signed-off-by: Vineeth Kalluru <[email protected]>
Signed-off-by: Vineeth Kalluru <[email protected]>
Signed-off-by: Vineeth Kalluru <[email protected]>
@vikalluru vikalluru marked this pull request as ready for review June 6, 2025 22:26
vikalluru and others added 22 commits June 11, 2025 23:32
Signed-off-by: Vineeth Kalluru <[email protected]>
Signed-off-by: Vineeth Kalluru <[email protected]>
Signed-off-by: Vineeth Kalluru <[email protected]>
Signed-off-by: Vineeth Kalluru <[email protected]>
Signed-off-by: Vineeth Kalluru <[email protected]>
Signed-off-by: Vineeth Kalluru <[email protected]>
Signed-off-by: Vineeth Kalluru <[email protected]>
Signed-off-by: Vineeth Kalluru <[email protected]>
Signed-off-by: Vineeth Kalluru <[email protected]>
@dglogo dglogo self-requested a review August 15, 2025 21:46
@skrithivasan
Copy link

skrithivasan commented Aug 15, 2025

Hey! These are the changes I had to make to resolve setup and functionality issues:
1. Tokenizers Build Issue
Problem: tokenizers==0.13.3 failed to compile with modern Rust compilers
Error: Unsafe casting code in old tokenizers version
Root Cause: Pinned transformers==4.33.3 dependency pulled incompatible tokenizers
Fix:

File: moment/pyproject.toml

- "transformers==4.33.3"
+ "transformers>=4.33.3,<5.0.0"`

Impact: Allows compatible tokenizers version with prebuilt wheels
2. Path Configuration Issues
Problem: Even after running 'source dot.env'${PWD_PATH} environment variable undefined, causing path resolution failures
Errors: Read-only filesystem, database not found, logging failures 'pdm log' issues.
Fix:

File: configs/config-reasoning.yml
Fixed all instances of ${PWD_PATH} to relative paths

- path: "${PWD_PATH}/pdm.log"
+ path: "./pdm.log"

- vector_store_path: "${PWD_PATH}/database"
+ vector_store_path: "./database"

- db_path: "${PWD_PATH}/data/nasa_turbo.db"  
+ db_path: "./data/nasa_turbo.db"

# And 10+ other similar path fixes

3. Frontend API Request Issues
Problem: AIQ Toolkit UI sending invalid API parameters
Error: 422 validation errors (max_tokens: 0, stop: true)
Fix:

// File: AIQToolkit-UI/pages/api/chat.ts
// Removed invalid parameters and fixed payload

- max_tokens: 0,
- stop: true,
- model: "string",
- use_knowledge_base: true,
- // ... other invalid params

+ temperature: 0.7,
+ max_tokens: 4000,
+ stream: true

4. Environment Configuration
Problem: NVIDIA_API_KEY, Catalyst Keys not propagated to server process had to manually export even after running source dot.env

5. Evaluation

Problem : [404] Not Found Multimodal Function. So even if the answers match the evaluation score returned is 0.

Signed-off-by: Vineeth Kalluru <[email protected]>
Signed-off-by: Vineeth Kalluru <[email protected]>
Signed-off-by: Vineeth Kalluru <[email protected]>
Signed-off-by: Vineeth Kalluru <[email protected]>
@vikalluru
Copy link
Author

Hey! These are the changes I had to make to resolve setup and functionality issues: 1. Tokenizers Build Issue Problem: tokenizers==0.13.3 failed to compile with modern Rust compilers Error: Unsafe casting code in old tokenizers version Root Cause: Pinned transformers==4.33.3 dependency pulled incompatible tokenizers Fix:

File: moment/pyproject.toml

- "transformers==4.33.3"
+ "transformers>=4.33.3,<5.0.0"`

Impact: Allows compatible tokenizers version with prebuilt wheels 2. Path Configuration Issues Problem: Even after running 'source dot.env'${PWD_PATH} environment variable undefined, causing path resolution failures Errors: Read-only filesystem, database not found, logging failures 'pdm log' issues. Fix:

File: configs/config-reasoning.yml
Fixed all instances of ${PWD_PATH} to relative paths

- path: "${PWD_PATH}/pdm.log"
+ path: "./pdm.log"

- vector_store_path: "${PWD_PATH}/database"
+ vector_store_path: "./database"

- db_path: "${PWD_PATH}/data/nasa_turbo.db"  
+ db_path: "./data/nasa_turbo.db"

# And 10+ other similar path fixes

3. Frontend API Request Issues Problem: AIQ Toolkit UI sending invalid API parameters Error: 422 validation errors (max_tokens: 0, stop: true) Fix:

// File: AIQToolkit-UI/pages/api/chat.ts
// Removed invalid parameters and fixed payload

- max_tokens: 0,
- stop: true,
- model: "string",
- use_knowledge_base: true,
- // ... other invalid params

+ temperature: 0.7,
+ max_tokens: 4000,
+ stream: true

4. Environment Configuration Problem: NVIDIA_API_KEY, Catalyst Keys not propagated to server process had to manually export even after running source dot.env

5. Evaluation

Problem : [404] Not Found Multimodal Function. So even if the answers match the evaluation score returned is 0.

Thank you so much for testing the workflow end-to-end. All your inputs were super valuable. Here is how solved them:

  1. For transformers issue, I updated the README to ask users to change the version for this library along with NUMPY. That should solve the issue.

  2. For pwd_path related issues, I removed the dependency on PWD_PATH completely. So no need to export that variable anymore. However, the user sitll has to manually update path at one place in the config file, which is still better than requiring to append the annoying "$@" thing to AIQ command to inject envrionment variables. Let me know what you think of this solution.

  3. Personally, I was not able to reproduce the frontend API issues, this problem should go away when I update this branch to support NeMo Agent Toolkit's latest version (Update AIQ to NAT in documentation and comments NeMo-Agent-Toolkit#614)

  4. Fixed the .env issue as well, now the environment variables should get exported properly

  5. Regarding Evaluation issues, I replaced the multimodal LLM judge model from llama-3.2-11b-vision-instruct to NVIDIA's nvidia/llama-3.1-nemotron-nano-vl-8b-v1 Multimodal LLM model. This should resolve the issue.

@skrithivasan
Copy link

Hey! These are the changes I had to make to resolve setup and functionality issues: 1. Tokenizers Build Issue Problem: tokenizers==0.13.3 failed to compile with modern Rust compilers Error: Unsafe casting code in old tokenizers version Root Cause: Pinned transformers==4.33.3 dependency pulled incompatible tokenizers Fix:

File: moment/pyproject.toml

- "transformers==4.33.3"
+ "transformers>=4.33.3,<5.0.0"`

Impact: Allows compatible tokenizers version with prebuilt wheels 2. Path Configuration Issues Problem: Even after running 'source dot.env'${PWD_PATH} environment variable undefined, causing path resolution failures Errors: Read-only filesystem, database not found, logging failures 'pdm log' issues. Fix:

File: configs/config-reasoning.yml
Fixed all instances of ${PWD_PATH} to relative paths

- path: "${PWD_PATH}/pdm.log"
+ path: "./pdm.log"

- vector_store_path: "${PWD_PATH}/database"
+ vector_store_path: "./database"

- db_path: "${PWD_PATH}/data/nasa_turbo.db"  
+ db_path: "./data/nasa_turbo.db"

# And 10+ other similar path fixes

3. Frontend API Request Issues Problem: AIQ Toolkit UI sending invalid API parameters Error: 422 validation errors (max_tokens: 0, stop: true) Fix:

// File: AIQToolkit-UI/pages/api/chat.ts
// Removed invalid parameters and fixed payload

- max_tokens: 0,
- stop: true,
- model: "string",
- use_knowledge_base: true,
- // ... other invalid params

+ temperature: 0.7,
+ max_tokens: 4000,
+ stream: true

4. Environment Configuration Problem: NVIDIA_API_KEY, Catalyst Keys not propagated to server process had to manually export even after running source dot.env
5. Evaluation
Problem : [404] Not Found Multimodal Function. So even if the answers match the evaluation score returned is 0.

Thank you so much for testing the workflow end-to-end. All your inputs were super valuable. Here is how solved them:

  1. For transformers issue, I updated the README to ask users to change the version for this library along with NUMPY. That should solve the issue.
  2. For pwd_path related issues, I removed the dependency on PWD_PATH completely. So no need to export that variable anymore. However, the user sitll has to manually update path at one place in the config file, which is still better than requiring to append the annoying "$@" thing to AIQ command to inject envrionment variables. Let me know what you think of this solution.
  3. Personally, I was not able to reproduce the frontend API issues, this problem should go away when I update this branch to support NeMo Agent Toolkit's latest version (Update AIQ to NAT in documentation and comments NeMo-Agent-Toolkit#614)
  4. Fixed the .env issue as well, now the environment variables should get exported properly
  5. Regarding Evaluation issues, I replaced the multimodal LLM judge model from llama-3.2-11b-vision-instruct to NVIDIA's nvidia/llama-3.1-nemotron-nano-vl-8b-v1 Multimodal LLM model. This should resolve the issue.

The evaluation issue on my end is solved after the model swamp. This looks good to me!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants