Agentic AI workflow for Predictive Maintenance #304

vikalluru · 2025-06-06T16:58:30Z

Agentic AI Workflow for Predictive Maintenance

Summary

This project implements a multi-agent predictive maintenance system for aircraft turbofan engines using the NeMo Agent Toolkit and the NASA C-MAPSS dataset. The system's design is flexible and can be adapted to any agentic workflow that involves retrieving, analyzing, and plotting time series data.

Architecture

React Agent: Orchestrates the main workflow using the ReAct pattern with NIM LLM integration.
SQL Retriever Tool: Automatically generates SQL queries using a ChromaDB vector database for schema retrieval.
RUL Prediction Tool: Employs an XGBoost regression model with StandardScaler preprocessing for predicting Remaining Useful Life (RUL).
Anomaly Detection Tool: Detects anomalies in sensor data using time series foundational models.
Plotting Agents: A multi-tool visualization agent that supports distribution, comparison, and time-series plots.

Salient Features

Utilizes a Reasoning model to generate a plan.
Includes an evaluation dataset with queries for tasks such as retrieval, prediction, anomaly detection, and visualization.
Users can customize or swap the dataset by providing the necessary documentation in the prompt.
A custom multimodal LLM model is used to judge generated text and plots.
Provides tracing options with Phoenix and RAGA AI.

Contributors: Vineeth Kalluru, Janaki Vamaraju, Sugandha Sharma, Ze Yang, Viraj Modak

Signed-off-by: Vineeth Kalluru <[email protected]>

Changes to the README.

Signed-off-by: Vineeth Kalluru <[email protected]>

…ully Signed-off-by: Vineeth Kalluru <[email protected]>

Signed-off-by: Vineeth Kalluru <[email protected]>

added command

Signed-off-by: Vineeth Kalluru <[email protected]>

skrithivasan · 2025-08-15T22:56:43Z

Hey! These are the changes I had to make to resolve setup and functionality issues:
1. Tokenizers Build Issue
Problem: tokenizers==0.13.3 failed to compile with modern Rust compilers
Error: Unsafe casting code in old tokenizers version
Root Cause: Pinned transformers==4.33.3 dependency pulled incompatible tokenizers
Fix:

File: moment/pyproject.toml

- "transformers==4.33.3"
+ "transformers>=4.33.3,<5.0.0"`

Impact: Allows compatible tokenizers version with prebuilt wheels
2. Path Configuration Issues
Problem: Even after running 'source dot.env'${PWD_PATH} environment variable undefined, causing path resolution failures
Errors: Read-only filesystem, database not found, logging failures 'pdm log' issues.
Fix:

File: configs/config-reasoning.yml
Fixed all instances of ${PWD_PATH} to relative paths

- path: "${PWD_PATH}/pdm.log"
+ path: "./pdm.log"

- vector_store_path: "${PWD_PATH}/database"
+ vector_store_path: "./database"

- db_path: "${PWD_PATH}/data/nasa_turbo.db"  
+ db_path: "./data/nasa_turbo.db"

# And 10+ other similar path fixes

3. Frontend API Request Issues
Problem: AIQ Toolkit UI sending invalid API parameters
Error: 422 validation errors (max_tokens: 0, stop: true)
Fix:

// File: AIQToolkit-UI/pages/api/chat.ts
// Removed invalid parameters and fixed payload

- max_tokens: 0,
- stop: true,
- model: "string",
- use_knowledge_base: true,
- // ... other invalid params

+ temperature: 0.7,
+ max_tokens: 4000,
+ stream: true

4. Environment Configuration
Problem: NVIDIA_API_KEY, Catalyst Keys not propagated to server process had to manually export even after running source dot.env

5. Evaluation

Problem : [404] Not Found Multimodal Function. So even if the answers match the evaluation score returned is 0.

Signed-off-by: Vineeth Kalluru <[email protected]>

vikalluru · 2025-08-18T06:39:22Z

Hey! These are the changes I had to make to resolve setup and functionality issues: 1. Tokenizers Build Issue Problem: tokenizers==0.13.3 failed to compile with modern Rust compilers Error: Unsafe casting code in old tokenizers version Root Cause: Pinned transformers==4.33.3 dependency pulled incompatible tokenizers Fix:

File: moment/pyproject.toml
- "transformers==4.33.3"
+ "transformers>=4.33.3,<5.0.0"`
Impact: Allows compatible tokenizers version with prebuilt wheels 2. Path Configuration Issues Problem: Even after running 'source dot.env'${PWD_PATH} environment variable undefined, causing path resolution failures Errors: Read-only filesystem, database not found, logging failures 'pdm log' issues. Fix:

File: configs/config-reasoning.yml
Fixed all instances of ${PWD_PATH} to relative paths
- path: "${PWD_PATH}/pdm.log"
+ path: "./pdm.log"

- vector_store_path: "${PWD_PATH}/database"
+ vector_store_path: "./database"

- db_path: "${PWD_PATH}/data/nasa_turbo.db"  
+ db_path: "./data/nasa_turbo.db"

# And 10+ other similar path fixes
3. Frontend API Request Issues Problem: AIQ Toolkit UI sending invalid API parameters Error: 422 validation errors (max_tokens: 0, stop: true) Fix:

// File: AIQToolkit-UI/pages/api/chat.ts
// Removed invalid parameters and fixed payload
- max_tokens: 0,
- stop: true,
- model: "string",
- use_knowledge_base: true,
- // ... other invalid params

+ temperature: 0.7,
+ max_tokens: 4000,
+ stream: true
4. Environment Configuration Problem: NVIDIA_API_KEY, Catalyst Keys not propagated to server process had to manually export even after running source dot.env

5. Evaluation

Problem : [404] Not Found Multimodal Function. So even if the answers match the evaluation score returned is 0.

Thank you so much for testing the workflow end-to-end. All your inputs were super valuable. Here is how solved them:

For transformers issue, I updated the README to ask users to change the version for this library along with NUMPY. That should solve the issue.
For pwd_path related issues, I removed the dependency on PWD_PATH completely. So no need to export that variable anymore. However, the user sitll has to manually update path at one place in the config file, which is still better than requiring to append the annoying "$@" thing to AIQ command to inject envrionment variables. Let me know what you think of this solution.
Personally, I was not able to reproduce the frontend API issues, this problem should go away when I update this branch to support NeMo Agent Toolkit's latest version (Update AIQ to NAT in documentation and comments NeMo-Agent-Toolkit#614)
Fixed the .env issue as well, now the environment variables should get exported properly
Regarding Evaluation issues, I replaced the multimodal LLM judge model from llama-3.2-11b-vision-instruct to NVIDIA's nvidia/llama-3.1-nemotron-nano-vl-8b-v1 Multimodal LLM model. This should resolve the issue.

skrithivasan · 2025-08-18T23:46:34Z

Hey! These are the changes I had to make to resolve setup and functionality issues: 1. Tokenizers Build Issue Problem: tokenizers==0.13.3 failed to compile with modern Rust compilers Error: Unsafe casting code in old tokenizers version Root Cause: Pinned transformers==4.33.3 dependency pulled incompatible tokenizers Fix:

File: moment/pyproject.toml
- "transformers==4.33.3"
+ "transformers>=4.33.3,<5.0.0"`
Impact: Allows compatible tokenizers version with prebuilt wheels 2. Path Configuration Issues Problem: Even after running 'source dot.env'${PWD_PATH} environment variable undefined, causing path resolution failures Errors: Read-only filesystem, database not found, logging failures 'pdm log' issues. Fix:

File: configs/config-reasoning.yml
Fixed all instances of ${PWD_PATH} to relative paths
- path: "${PWD_PATH}/pdm.log"
+ path: "./pdm.log"

- vector_store_path: "${PWD_PATH}/database"
+ vector_store_path: "./database"

- db_path: "${PWD_PATH}/data/nasa_turbo.db"  
+ db_path: "./data/nasa_turbo.db"

# And 10+ other similar path fixes
3. Frontend API Request Issues Problem: AIQ Toolkit UI sending invalid API parameters Error: 422 validation errors (max_tokens: 0, stop: true) Fix:

// File: AIQToolkit-UI/pages/api/chat.ts
// Removed invalid parameters and fixed payload
- max_tokens: 0,
- stop: true,
- model: "string",
- use_knowledge_base: true,
- // ... other invalid params

+ temperature: 0.7,
+ max_tokens: 4000,
+ stream: true
4. Environment Configuration Problem: NVIDIA_API_KEY, Catalyst Keys not propagated to server process had to manually export even after running source dot.env
5. Evaluation
Problem : [404] Not Found Multimodal Function. So even if the answers match the evaluation score returned is 0.
Thank you so much for testing the workflow end-to-end. All your inputs were super valuable. Here is how solved them:

For transformers issue, I updated the README to ask users to change the version for this library along with NUMPY. That should solve the issue.

For pwd_path related issues, I removed the dependency on PWD_PATH completely. So no need to export that variable anymore. However, the user sitll has to manually update path at one place in the config file, which is still better than requiring to append the annoying "$@" thing to AIQ command to inject envrionment variables. Let me know what you think of this solution.

Personally, I was not able to reproduce the frontend API issues, this problem should go away when I update this branch to support NeMo Agent Toolkit's latest version (Update AIQ to NAT in documentation and comments NeMo-Agent-Toolkit#614)

Fixed the .env issue as well, now the environment variables should get exported properly

Regarding Evaluation issues, I replaced the multimodal LLM judge model from llama-3.2-11b-vision-instruct to NVIDIA's nvidia/llama-3.1-nemotron-nano-vl-8b-v1 Multimodal LLM model. This should resolve the issue.

The evaluation issue on my end is solved after the model swamp. This looks good to me!

vikalluru added 7 commits June 6, 2025 11:57

Test commit

584ea3e

Signed-off-by: Vineeth Kalluru <[email protected]>

Git ignore with new README

b1cf8c2

Signed-off-by: Vineeth Kalluru <[email protected]>

Full agentic workflow

4f69236

Signed-off-by: Vineeth Kalluru <[email protected]>

Simpler Readme

851e22c

Signed-off-by: Vineeth Kalluru <[email protected]>

test prompts added

822c7df

Signed-off-by: Vineeth Kalluru <[email protected]>

Verified full flow. Added screenshots

676339f

Signed-off-by: Vineeth Kalluru <[email protected]>

Images

d220921

Signed-off-by: Vineeth Kalluru <[email protected]>

vikalluru marked this pull request as ready for review June 6, 2025 22:26

vikalluru and others added 22 commits June 11, 2025 23:32

Reasoning based workflow complete

eea69e5

Signed-off-by: Vineeth Kalluru <[email protected]>

Merge branch 'NVIDIA:main' into vikalluru/pdm_aiq_agent

f4644a0

architecture diagram

fb984ab

Signed-off-by: Vineeth Kalluru <[email protected]>

EOL fixes

1ee76b3

Signed-off-by: Vineeth Kalluru <[email protected]>

config files

ae1386d

Signed-off-by: Vineeth Kalluru <[email protected]>

Readme update with code execution sandbox

304895e

Signed-off-by: Vineeth Kalluru <[email protected]>

PDM test workflow

7532d56

Signed-off-by: Vineeth Kalluru <[email protected]>

changes to the read me

a64dc5a

Update README.md with local-sandbox related changes

7d12518

Merge pull request #4 from vikalluru/my-pdm-update

dfc89aa

Changes to the README.

Included and improved Viraj's changes

c1663bf

Signed-off-by: Vineeth Kalluru <[email protected]>

Cleaned up solution works, yet to test: Eval harness, RAGA integration

4161251

Signed-off-by: Vineeth Kalluru <[email protected]>

Pyproject fixes and dot.env cleanup

d2701f1

Signed-off-by: Vineeth Kalluru <[email protected]>

added command

7ff946d

Phoenix and Catalyst Tracing works. Implemented Eval harness successf…

5324442

…ully Signed-off-by: Vineeth Kalluru <[email protected]>

Sample eval output added along with Eval dataset

1f61867

Signed-off-by: Vineeth Kalluru <[email protected]>

Merge pull request #6 from vikalluru/skrithivasan/pdm-update-eval

72e4a56

added command

vanna fix

e5bb26b

Signed-off-by: Vineeth Kalluru <[email protected]>

More fixes

ab8475c

Signed-off-by: Vineeth Kalluru <[email protected]>

Pytest works

8ee2f83

Signed-off-by: Vineeth Kalluru <[email protected]>

Basic plot + text evaluation works

8a1cc18

Signed-off-by: Vineeth Kalluru <[email protected]>

Multimodal evaluation first attempt

9d5fedd

Signed-off-by: Vineeth Kalluru <[email protected]>

vikalluru added 5 commits August 6, 2025 17:09

Anomaly detection workflow added, folder structure modified

d231fae

Signed-off-by: Vineeth Kalluru <[email protected]>

Full workflow with eval check complete - 21/24 pass

83216be

Signed-off-by: Vineeth Kalluru <[email protected]>

Updated README

83f918b

Signed-off-by: Vineeth Kalluru <[email protected]>

PDM arch diagram updated

a72c749

Signed-off-by: Vineeth Kalluru <[email protected]>

Updated readme 2

bb87a64

Signed-off-by: Vineeth Kalluru <[email protected]>

dglogo self-requested a review August 15, 2025 21:46

vikalluru added 4 commits August 15, 2025 17:41

Fixed config

4457195

Signed-off-by: Vineeth Kalluru <[email protected]>

Back ticks fixed

61eb379

Signed-off-by: Vineeth Kalluru <[email protected]>

Removed PWD path from requirement

18a2754

Signed-off-by: Vineeth Kalluru <[email protected]>

Fixed dot.env related issues

85a9a8d

Signed-off-by: Vineeth Kalluru <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Agentic AI workflow for Predictive Maintenance #304

Agentic AI workflow for Predictive Maintenance #304

vikalluru commented Jun 6, 2025 •

edited

Loading

Uh oh!

skrithivasan commented Aug 15, 2025 •

edited

Loading

Uh oh!

vikalluru commented Aug 18, 2025

Uh oh!

skrithivasan commented Aug 18, 2025

Uh oh!

Uh oh!

Agentic AI workflow for Predictive Maintenance #304

Are you sure you want to change the base?

Agentic AI workflow for Predictive Maintenance #304

Conversation

vikalluru commented Jun 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!