diff --git a/content/develop/ai/langcache.md b/content/develop/ai/langcache.md
deleted file mode 100644
index 5c6e88927..000000000
--- a/content/develop/ai/langcache.md
+++ /dev/null
@@ -1,96 +0,0 @@
----
-Title: Redis LangCache
-alwaysopen: false
-categories:
-- docs
-- develop
-- ai
-description: Redis LangCache provides semantic caching-as-a-service to reduce LLM costs and improve response times for AI applications.
-linkTitle: LangCache
-weight: 30
----
-
-Redis LangCache is a fully-managed semantic caching service that reduces large language model (LLM) costs and improves response times for AI applications.
-
-## How LangCache works
-
-LangCache uses semantic caching to store and reuse previous LLM responses for similar queries. Instead of calling the LLM for every request, LangCache:
-
-- **Checks for similar cached responses** when a new query arrives
-- **Returns cached results instantly** if a semantically similar response exists
-- **Stores new responses** for future reuse when no cache match is found
-
-## Key benefits
-
-### Cost reduction
-LangCache significantly reduces LLM costs by eliminating redundant API calls. Since up to 90% of LLM requests are repetitive, caching frequently-requested responses provides substantial cost savings.
-
-### Improved performance
-Cached responses are retrieved from memory, providing response times up to 15 times faster than LLM API calls. This improvement is particularly beneficial for retrieval-augmented generation (RAG) applications.
-
-### Simple deployment
-LangCache is available as a managed service through a REST API. The service includes:
-
-- Automated embedding generation
-- Configurable cache controls
-- Simple billing structure
-- No database management required
-
-### Advanced cache management
-The service provides comprehensive cache management features:
-
-- Data access and privacy controls
-- Configurable eviction protocols
-- Usage monitoring and analytics
-- Cache hit rate tracking
-
-## Use cases
-
-### AI assistants and chatbots
-Optimize conversational AI applications by caching common responses and reducing latency for frequently asked questions.
-
-### RAG applications
-Improve retrieval-augmented generation performance by caching responses to similar queries, reducing both cost and response time.
-
-### AI agents
-Enhance multi-step reasoning chains and agent workflows by caching intermediate results and common reasoning patterns.
-
-### AI gateways
-Integrate LangCache into centralized AI gateway services to manage and control LLM costs across multiple applications.
-
-## Getting started
-
-LangCache is currently available through a private preview program. The service is accessible via REST API and supports any programming language.
-
-### Prerequisites
-
-To use LangCache, you need:
-
-- An AI application that makes LLM API calls
-- A use case involving repetitive or similar queries
-- Willingness to provide feedback during the preview phase
-
-### Access
-
-LangCache is offered as a fully-managed cloud service. During the private preview:
-
-- Participation is free
-- Usage limits may apply
-- Dedicated support is provided
-- Regular feedback sessions are conducted
-
-## Data security and privacy
-
-LangCache stores your data on your Redis servers. Redis does not access your data or use it to train AI models. The service maintains enterprise-grade security and privacy standards.
-
-## Support
-
-Private preview participants receive:
-
-- Dedicated onboarding resources
-- Documentation and tutorials
-- Email and chat support
-- Regular check-ins with the product team
-- Exclusive roadmap updates
-
-For more information about joining the private preview, visit the [Redis LangCache website](https://redis.io/langcache/).
diff --git a/content/develop/ai/langcache/_index.md b/content/develop/ai/langcache/_index.md
new file mode 100644
index 000000000..db56ec3c9
--- /dev/null
+++ b/content/develop/ai/langcache/_index.md
@@ -0,0 +1,111 @@
+---
+Title: Redis LangCache
+alwaysopen: false
+categories:
+- docs
+- develop
+- ai
+description: Store LLM responses for AI apps in a semantic cache.
+linkTitle: LangCache
+hideListLinks: true
+weight: 30
+---
+
+Redis LangCache is a fully-managed semantic caching service that reduces large language model (LLM) costs and improves response times for AI applications.
+
+## LangCache overview
+
+LangCache uses semantic caching to store and reuse previous LLM responses for repeated queries. Instead of calling the LLM for every request, LangCache checks if a similar response has already been generated and is stored in the cache. If a match is found, LangCache returns the cached response instantly, saving time and resources.
+
+Imagine you’re using an LLM to build an agent to answer questions about your company's products. Your users may ask questions like the following:
+
+- "What are the features of Product A?"
+- "Can you list the main features of Product A?"
+- "Tell me about Product A’s features."
+
+These prompts may have slight variations, but they essentially ask the same question. LangCache can help you avoid calling the LLM for each of these prompts by caching the response to the first prompt and returning it for any similar prompts.
+
+Using LangCache as a semantic caching service has the following benefits:
+
+- **Lower LLM costs**: Reduce costly LLM calls by easily storing the most frequently-requested responses.
+- **Faster AI app responses**: Get faster AI responses by retrieving previously-stored requests from memory.
+- **Simpler Deployments**: Access our managed service using a REST API with automated embedding generation, configurable controls, and no database management required.
+- **Advanced cache management**: Manage data access and privacy, eviction protocols, and monitor usage and cache hit rates.
+
+LangCache works well for the following use cases:
+
+- **AI assistants and chatbots**: Optimize conversational AI applications by caching common responses and reducing latency for frequently asked questions.
+- **RAG applications**: Enhance retrieval-augmented generation performance by caching responses to similar queries, reducing both cost and response time.
+- **AI agents**: Improve multi-step reasoning chains and agent workflows by caching intermediate results and common reasoning patterns.
+- **AI gateways**: Integrate LangCache into centralized AI gateway services to manage and control LLM costs across multiple applications..
+
+### LLM cost reduction with LangCache
+
+{{< embed-md "langcache-cost-reduction.md" >}}
+
+## LangCache architecture
+
+The following diagram displays how you can integrate LangCache into your GenAI app:
+
+{{< image filename="images/rc/langcache-process.png" alt="The LangCache process diagram." >}}
+
+1. A user sends a prompt to your AI app.
+1. Your app sends the prompt to LangCache through the `POST /v1/caches/{cacheId}/search` endpoint.
+1. LangCache calls an embedding model service to generate an embedding for the prompt.
+1. LangCache searches the cache to see if a similar response already exists by matching the embeddings of the new query with the stored embeddings.
+1. If a semantically similar entry is found (also known as a cache hit), LangCache gets the cached response and returns it to your app. Your app can then send the cached response back to the user.
+1. If no match is found (also known as a cache miss), your app receives an empty response from LangCache. Your app then queries your chosen LLM to generate a new response.
+1. Your app sends the prompt and the new response to LangCache through the `POST /v1/caches/{cacheId}/entries` endpoint.
+1. LangCache stores the embedding with the new response in the cache for future use.
+
+See the [LangCache API reference]({{< relref "/develop/ai/langcache/api-reference" >}}) for more information on how to use the LangCache API.
+
+## Get started
+
+LangCache is currently in preview:
+
+- Public preview on [Redis Cloud]({{< relref "/operate/rc/langcache" >}})
+- Fully-managed [private preview](https://redis.io/langcache/)
+
+{{< multitabs id="langcache-get-started"
+ tab1="Redis Cloud"
+ tab2="Private preview" >}}
+
+{{< embed-md "rc-langcache-get-started.md" >}}
+
+-tab-sep-
+
+### Prerequisites
+
+To use LangCache in private preview, you need:
+
+- An AI application that makes LLM API calls
+- A use case involving repetitive or similar queries
+- Willingness to provide feedback during the preview phase
+
+### Access
+
+LangCache is offered as a fully-managed service. During the private preview:
+
+- Participation is free
+- Usage limits may apply
+- Dedicated support is provided
+- Regular feedback sessions are conducted
+
+### Data security and privacy
+
+LangCache stores your data on your Redis servers. Redis does not access your data or use it to train AI models. The service maintains enterprise-grade security and privacy standards.
+
+### Support
+
+Private preview participants receive:
+
+- Dedicated onboarding resources
+- Documentation and tutorials
+- Email and chat support
+- Regular check-ins with the product team
+- Exclusive roadmap updates
+
+For more information about joining the private preview, visit the [Redis LangCache website](https://redis.io/langcache/).
+
+{{< /multitabs >}}
diff --git a/content/develop/ai/langcache/api-reference.md b/content/develop/ai/langcache/api-reference.md
new file mode 100644
index 000000000..c94ed98dd
--- /dev/null
+++ b/content/develop/ai/langcache/api-reference.md
@@ -0,0 +1,129 @@
+---
+alwaysopen: false
+categories:
+- docs
+- develop
+- ai
+description: Learn to use the Redis LangCache API for semantic caching.
+hideListLinks: true
+linktitle: API and SDK reference
+title: LangCache API and SDK reference
+weight: 10
+---
+
+You can use the LangCache API from your client app to store and retrieve LLM, RAG, or agent responses.
+
+To access the LangCache API, you need:
+
+- LangCache API base URL
+- LangCache service API key
+- Cache ID
+
+When you call the API, you need to pass the LangCache API key in the `Authorization` header as a Bearer token and the Cache ID as the `cacheId` path parameter.
+
+For example, to check the health of the cache using `cURL`:
+
+```bash
+curl -s -X GET "https://$HOST/v1/caches/$CACHE_ID/health" \
+ -H "accept: application/json" \
+ -H "Authorization: Bearer $API_KEY"
+```
+
+- The example expects several variables to be set in the shell:
+
+ - **$HOST** - the LangCache API base URL
+ - **$CACHE_ID** - the Cache ID of your cache
+ - **$API_KEY** - The LangCache API token
+
+{{% info %}}
+This example uses `cURL` and Linux shell scripts to demonstrate the API; you can use any standard REST client or library.
+{{% /info %}}
+
+You can also use the [LangCache SDKs](#langcache-sdk) for Javascript and Python to access the API.
+
+## API examples
+
+### Check cache health
+
+Use `GET /v1/caches/{cacheId}/health` to check the health of the cache.
+
+```sh
+GET https://[host]/v1/caches/{cacheId}/health
+```
+
+### Search LangCache for similar responses
+
+Use `POST /v1/caches/{cacheId}/search` to search the cache for matching responses to a user prompt.
+
+```sh
+POST https://[host]/v1/caches/{cacheId}/search
+{
+ "prompt": "User prompt text"
+}
+```
+
+Place this call in your client app right before you call your LLM's REST API. If LangCache returns a response, you can send that response back to the user instead of calling the LLM.
+
+If LangCache does not return a response, you should call your LLM's REST API to generate a new response. After you get a response from the LLM, you can [store it in LangCache](#store-a-new-response-in-langcache) for future use.
+
+You can also scope the responses returned from LangCache by adding an `attributes` object to the request. LangCache will only return responses that match the attributes you specify.
+
+```sh
+POST https://[host]/v1/caches/{cacheId}/search
+{
+ "prompt": "User prompt text",
+ "attributes": {
+ "customAttributeName": "customAttributeValue"
+ }
+}
+```
+
+### Store a new response in LangCache
+
+Use `POST /v1/caches/{cacheId}/entries` to store a new response in the cache.
+
+```sh
+POST https://[host]/v1/caches/{cacheId}/entries
+{
+ "prompt": "User prompt text",
+ "response": "LLM response text"
+}
+```
+
+Place this call in your client app after you get a response from the LLM. This will store the response in the cache for future use.
+
+You can also store the responses with custom attributes by adding an `attributes` object to the request.
+
+```sh
+POST https://[host]/v1/caches/{cacheId}/entries
+{
+ "prompt": "User prompt text",
+ "response": "LLM response text",
+ "attributes": {
+ "customAttributeName": "customAttributeValue"
+ }
+}
+```
+
+### Delete cached responses
+
+Use `DELETE /v1/caches/{cacheId}/entries/{entryId}` to delete a cached response from the cache.
+
+You can also use `DELETE /v1/caches/{cacheId}/entries` to delete multiple cached responses at once. If you provide an `attributes` object, LangCache will delete all responses that match the attributes you specify.
+
+```sh
+DELETE https://[host]/v1/caches/{cacheId}/entries
+{
+ "attributes": {
+ "customAttributeName": "customAttributeValue"
+ }
+}
+```
+## LangCache SDK
+
+If your app is written in Javascript or Python, you can also use the LangCache Software Development Kits (SDKs) to access the API.
+
+To learn how to use the LangCache SDKs:
+
+- [LangCache SDK for Javascript](https://www.npmjs.com/package/@redis-ai/langcache)
+- [LangCache SDK for Python](https://pypi.org/project/langcache/)
diff --git a/content/embeds/langcache-cost-reduction.md b/content/embeds/langcache-cost-reduction.md
new file mode 100644
index 000000000..5850486ac
--- /dev/null
+++ b/content/embeds/langcache-cost-reduction.md
@@ -0,0 +1,21 @@
+LangCache reduces your LLM costs by caching responses and avoiding repeated API calls. When a response is served from cache, you don’t pay for output tokens. Input token costs are typically offset by embedding and storage costs.
+
+For every cached response, you'll save the output token cost. To calculate your monthly savings with LangCache, you can use the following formula:
+
+```bash
+Est. monthly savings with LangCache =
+ (Monthly output token costs) × (Cache hit rate)
+```
+
+The more requests you serve from LangCache, the more you save, because you’re not paying to regenerate the output.
+
+Here’s an example:
+- Monthly LLM spend: $200
+- Percentage of output tokens in your spend: 60%
+- Cost of output tokens: $200 × 60% = $120
+- Cache hit rate: 50%
+- Estimated savings: $120 × 50% = $60/month
+
+{{}}
+The formula and numbers above provide a rough estimate of your monthly savings. Actual savings will vary depending on your usage.
+{{}}
\ No newline at end of file
diff --git a/content/embeds/rc-langcache-get-started.md b/content/embeds/rc-langcache-get-started.md
new file mode 100644
index 000000000..8d7e337fc
--- /dev/null
+++ b/content/embeds/rc-langcache-get-started.md
@@ -0,0 +1,7 @@
+To set up LangCache on Redis Cloud:
+
+1. [Create a database]({{< relref "/operate/rc/databases/create-database" >}}) on Redis Cloud.
+2. [Create a LangCache service]({{< relref "/operate/rc/langcache/create-service" >}}) for your database on Redis Cloud.
+3. [Use the LangCache API]({{< relref "/operate/rc/langcache/use-langcache" >}}) from your client app.
+
+After you set up LangCache, you can [view and edit the cache]({{< relref "/operate/rc/langcache/view-edit-cache" >}}) and [monitor the cache's performance]({{< relref "/operate/rc/langcache/monitor-cache" >}}).
\ No newline at end of file
diff --git a/content/operate/rc/langcache/_index.md b/content/operate/rc/langcache/_index.md
index 5f21d18b7..9d121dd31 100644
--- a/content/operate/rc/langcache/_index.md
+++ b/content/operate/rc/langcache/_index.md
@@ -6,77 +6,21 @@ categories:
- rc
description: Store LLM responses for AI applications in Redis Cloud.
hideListLinks: true
-linktitle: LangCache
-title: Semantic caching with LangCache
+linktitle: LangCache
+title: Semantic caching with LangCache on Redis Cloud
weight: 36
+bannerText: LangCache on Redis Cloud is currently available as a public preview.
+bannerChildren: true
---
-LangCache is a semantic caching service available as a REST API that stores LLM responses for fast and cheaper retrieval, built on the Redis vector database. By using semantic caching, customers can significantly reduce API costs and lower the average latency of their generative AI applications.
+LangCache is a semantic caching service available as a REST API that stores LLM responses for fast and cheaper retrieval, built on the Redis vector database. By using semantic caching, you can significantly reduce API costs and lower the average latency of your generative AI applications.
-## LangCache overview
+For more information about how LangCache works, see the [LangCache overview]({{< relref "/develop/ai/langcache" >}}).
-LangCache uses semantic caching to store and reuse previous LLM responses for repeated queries. Instead of calling the LLM for every request, LangCache checks if a similar response has already been generated and is stored in the cache. If a match is found, LangCache returns the cached response instantly, saving time and resources.
+## LLM cost reduction with LangCache
-Imagine you’re using an LLM to build an agent to answer questions about your company's products. Your users may ask questions like the following:
-
-- "What are the features of Product A?"
-- "Can you list the main features of Product A?"
-- "Tell me about Product A’s features."
-
-These prompts may have slight variations, but they essentially ask the same question. LangCache can help you avoid calling the LLM for each of these prompts by caching the response to the first prompt and returning it for any similar prompts.
-
-Using LangCache as a semantic caching service in Redis Cloud has the following benefits:
-
-- **Lower LLM costs**: Reduce costly LLM calls by easily storing the most frequently-requested responses.
-- **Faster AI app responses**: Get faster AI responses by retrieving previously-stored requests from memory.
-- **Simpler Deployments**: Access our managed service via a REST API with automated embedding generation, configurable controls.
-- **Advanced cache management**: Manage data access and privacy, eviction protocols, and monitor usage and cache hit rates.
-
-### LLM cost reduction with LangCache
-
-LangCache reduces your LLM costs by caching responses and avoiding repeated API calls. When a response is served from cache, you don’t pay for output tokens. Input token costs are typically offset by embedding and storage costs.
-
-For every cached response, you'll save the output token cost. To calculate your monthly savings with LangCache, you can use the following formula:
-
-```bash
-Est. monthly savings with LangCache =
- (Monthly output token costs) × (Cache hit rate)
-```
-
-The more requests you serve from LangCache, the more you save, because you’re not paying to regenerate the output.
-
-Here’s an example:
-- Monthly LLM spend: $200
-- Percentage of output tokens in your spend: 60%
-- Cost of output tokens: $200 × 60% = $120
-- Cache hit rate: 50%
-- Estimated savings: $120 × 50% = $60/month
-
-{{}}
-The formula and numbers above provide a rough estimate of your monthly savings. Actual savings will vary depending on your usage.
-{{}}
-
-## LangCache architecture
-
-The following diagram displays how you can integrate LangCache into your GenAI app:
-
-{{< image filename="images/rc/langcache-process.png" >}}
-
-1. A user sends a prompt to your AI app.
-1. Your app sends the prompt to LangCache through the `POST /v1/caches/{cacheId}/search` endpoint.
-1. LangCache calls an embedding model service to generate an embedding for the prompt.
-1. LangCache searches the cache to see if a similar response already exists by matching the embeddings of the new query with the stored embeddings.
-1. If a semantically similar entry is found (also known as a cache hit), LangCache gets the cached response and returns it to your app. Your app can then send the cached response back to the user.
-1. If no match is found (also known as a cache miss), your app receives an empty response from LangCache. Your app then queries your chosen LLM to generate a new response.
-1. Your app sends the prompt and the new response to LangCache through the `POST /v1/caches/{cacheId}/entries` endpoint.
-1. LangCache stores the embedding with the new response in the cache for future use.
+{{< embed-md "langcache-cost-reduction.md" >}}
## Get started with LangCache on Redis Cloud
-To set up LangCache on Redis Cloud:
-
-1. [Create a database]({{< relref "/operate/rc/databases/create-database" >}}) on Redis Cloud.
-2. [Create a LangCache service]({{< relref "/operate/rc/langcache/create-service" >}}) for your database.
-3. [Use the LangCache API]({{< relref "/operate/rc/langcache/use-langcache" >}}) from your client app.
-
-After you set up LangCache, you can [view and edit the cache]({{< relref "/operate/rc/langcache/view-edit-cache" >}}) and [monitor the cache's performance]({{< relref "/operate/rc/langcache/monitor-cache" >}}).
\ No newline at end of file
+{{< embed-md "rc-langcache-get-started.md" >}}
\ No newline at end of file
diff --git a/content/operate/rc/langcache/use-langcache.md b/content/operate/rc/langcache/use-langcache.md
index f344b685e..80713073d 100644
--- a/content/operate/rc/langcache/use-langcache.md
+++ b/content/operate/rc/langcache/use-langcache.md
@@ -7,7 +7,7 @@ categories:
description: null
hideListLinks: true
linktitle: Use LangCache
-title: Use the LangCache API with your GenAI app
+title: Use the LangCache API on Redis Cloud
weight: 10
---
@@ -19,103 +19,10 @@ To access the LangCache API, you need:
- LangCache service API key
- Cache ID
-The base URL and cache ID are available in the LangCache service's **Configuration** page in the [**Connectivity** section]({{< relref "/operate/rc/langcache/view-edit-cache#connectivity" >}}).
+For LangCache on Redis Cloud, the base URL and cache ID are available in the LangCache service's **Configuration** page in the [**Connectivity** section]({{< relref "/operate/rc/langcache/view-edit-cache#connectivity" >}}).
The LangCache API key is only available immediately after you create the LangCache service. If you lost this value, you will need to [replace the service API key]({{< relref "/operate/rc/langcache/view-edit-cache#replace-service-api-key" >}}) to be able to use the LangCache API.
When you call the API, you need to pass the LangCache API key in the `Authorization` header as a Bearer token and the Cache ID as the `cacheId` path parameter.
-For example, to check the health of the cache using `cURL`:
-
-```bash
-curl -s -X GET "https://$HOST/v1/caches/$CACHE_ID/health" \
- -H "accept: application/json" \
- -H "Authorization: Bearer $API_KEY"
-```
-
-- The example expects several variables to be set in the shell:
-
- - **$HOST** - the LangCache API base URL
- - **$CACHE_ID** - the Cache ID of your cache
- - **$API_KEY** - The LangCache API token
-
-{{% info %}}
-This example uses `cURL` and Linux shell scripts to demonstrate the API; you can use any standard REST client or library.
-{{% /info %}}
-
-## Check cache health
-
-Use `GET /v1/caches/{cacheId}/health` to check the health of the cache.
-
-```sh
-GET https://[host]/v1/caches/{cacheId}/health
-```
-
-## Search LangCache for similar responses
-
-Use `POST /v1/caches/{cacheId}/search` to search the cache for matching responses to a user prompt.
-
-```sh
-POST https://[host]/v1/caches/{cacheId}/search
-{
- "prompt": "User prompt text"
-}
-```
-
-Place this call in your client app right before you call your LLM's REST API. If LangCache returns a response, you can send that response back to the user instead of calling the LLM.
-
-If LangCache does not return a response, you should call your LLM's REST API to generate a new response. After you get a response from the LLM, you can [store it in LangCache](#store-a-new-response-in-langcache) for future use.
-
-You can also scope the responses returned from LangCache by adding an `attributes` object to the request. LangCache will only return responses that match the attributes you specify.
-
-```sh
-POST https://[host]/v1/caches/{cacheId}/search
-{
- "prompt": "User prompt text",
- "attributes": {
- "customAttributeName": "customAttributeValue"
- }
-}
-```
-
-## Store a new response in LangCache
-
-Use `POST /v1/caches/{cacheId}/entries` to store a new response in the cache.
-
-```sh
-POST https://[host]/v1/caches/{cacheId}/entries
-{
- "prompt": "User prompt text",
- "response": "LLM response text"
-}
-```
-
-Place this call in your client app after you get a response from the LLM. This will store the response in the cache for future use.
-
-You can also store the responses with custom attributes by adding an `attributes` object to the request.
-
-```sh
-POST https://[host]/v1/caches/{cacheId}/entries
-{
- "prompt": "User prompt text",
- "response": "LLM response text",
- "attributes": {
- "customAttributeName": "customAttributeValue"
- }
-}
-```
-
-## Delete cached responses
-
-Use `DELETE /v1/caches/{cacheId}/entries/{entryId}` to delete a cached response from the cache.
-
-You can also use `DELETE /v1/caches/{cacheId}/entries` to delete multiple cached responses at once. If you provide an `attributes` object, LangCache will delete all responses that match the attributes you specify.
-
-```sh
-DELETE https://[host]/v1/caches/{cacheId}/entries
-{
- "attributes": {
- "customAttributeName": "customAttributeValue"
- }
-}
-```
+See the [LangCache API reference]({{< relref "/develop/ai/langcache/api-reference" >}}) for more information on how to use the LangCache API.