Skip to content

[Draft]: cspl-3550: support Ingestion and Indexing seperation #1498

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: develop
Choose a base branch
from

Conversation

vivekr-splunk
Copy link
Collaborator

@vivekr-splunk vivekr-splunk commented Apr 28, 2025

🚀 Background

As part of our effort to decouple ingestion and indexing in Splunk Operator for Kubernetes, this proof-of-concept introduces a new IngestionCluster custom resource and adds an IngestionCluster playbook to our splunk-ansible codebase. The ingestion service is now deployed as a Kubernetes StatefulSet, allowing for stable network identities and persistent storage, while still supporting independent scaling and lifecycle management.


✨ What’s Changed

  1. New CR

    • ingestioncluster.splunk.com/v4
    • Schema fields for replicas, configMapRef, serviceRef, and HPA parameters.
  2. Controller Enhancements

    • Reconcile logic now watches IngestionCluster resources.
    • Creates/updates:
      • StatefulSet (with readiness/liveness probes)
      • Dedicated ConfigMap & Service
      • Shared Namespace Secret reference
      • HorizontalPodAutoscaler
  3. Splunk-Ansible Playbook

    • New ingestion_cluster.yml role in ansible/roles/
    • Templates to configure the Splunk Docker image for ingestion-specific settings under the IngestionCluster CR.
  4. Sample Manifests & Documentation

    • deploy/ingestioncluster-sample.yaml demonstrating:
      • IngestionCluster CR
      • Corresponding ConfigMap & Service
    • Quick-start instructions in docs/ingestion-cluster.md.
  5. Connectivity Configuration

    • Environment variables and service endpoints configured to allow StatefulSet pods to communicate securely with existing IndexerCluster StatefulSets.
  6. Testing

    • Manual POC walkthrough in a test cluster:
      • Deploy CR → verify pods come up → simulate load → observe HPA scaling → confirm indexing connectivity.
    • Automated unit tests for controller reconciliation logic.

🧪 How to Test (POC)

  1. Build & Deploy Operator

    make docker-build IMG=<your-registry>/splunk-operator:pr-poc
    make deploy IMG=<your-registry>/splunk-operator:pr-poc
  2. Apply Sample IngestionCluster

    kubectl apply -f deploy/ingestioncluster-sample.yaml
  3. Verify Resources

    # Check CR status
    kubectl get ingestioncluster -n splunk
    
    # Verify StatefulSet, ConfigMap, Service, HPA exist
    kubectl get sts,cm,svc,hpa -l app=splunk-ingestioncluster -n splunk
  4. Connectivity Check

    • Exec into an ingestion-cluster pod and curl the IndexerCluster service endpoint.
    • Confirm logs show successful registration.
  5. Scaling

    # Simulate load (e.g., generate synthetic events)
    # Observe HPA scaling:
    kubectl get hpa -n splunk
    kubectl get pods -l app=splunk-ingestioncluster -n splunk

🔜 Next Steps

  • Polish controller error handling and add end-to-end tests.
  • Integrate this POC into our CI pipeline.
  • Review feedback and iterate on schema, playbook tasks, and documentation.

🔗 Related Issues

  • Implements POC proposal (#TODO)
  • Tracks IngestionCluster CR design (#TODO)
  • Splunk-Ansible IngestionCluster role (#TODO)

Reviewer Notes:

  • This PR focuses solely on the POC; production hardening (RBAC, security contexts, resource limits) will follow once feasibility is confirmed.
  • You may deploy the sample manifests into a fresh namespace to avoid conflicts with existing CRs.

@vivekr-splunk vivekr-splunk requested a review from Copilot April 28, 2025 22:24
@vivekr-splunk vivekr-splunk self-assigned this Apr 28, 2025
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This draft PR introduces support for the separation of ingestion and indexing functionality. Key changes include:

  • A new IngestionCluster controller added in controllers/ingestioncluster_controller.go along with supporting test initialization in controllers/suite_test.go.
  • New RBAC roles and sample manifests for managing IngestionCluster resources.
  • Several CRD base files and kustomization updates to support ingestion resources (with accompanying patches for conversion webhook and CA injection).

Reviewed Changes

Copilot reviewed 20 out of 25 changed files in this pull request and generated no comments.

Show a summary per file
File Description
controllers/suite_test.go Registers the new enterprisev4 API with a scheme; note the inconsistent use of scheme aliases.
controllers/ingestioncluster_controller.go Introduces the new reconciler and links to existing enterprise apply logic.
config/samples/enterprise_v4_ingestioncluster.yaml Provides a sample manifest for the new IngestionCluster resource.
config/rbac/* Adds new RBAC roles for ingestioncluster viewer and editor and updates general role resources.
config/crd/* Updates CRD bases and kustomization to include ingestionclusters with several new patches (some commented out) and a controller-gen version adjustment.
Files not reviewed (5)
  • PROJECT: Language not supported
  • config/crd/bases/enterprise.splunk.com_clustermanagers.yaml: Language not supported
  • api/v4/zz_generated.deepcopy.go: Language not supported
  • api/v4/ingestioncluster_types.go: Language not supported
  • config/crd/bases/enterprise.splunk.com_clustermasters.yaml: Language not supported
Comments suppressed due to low confidence (3)

controllers/suite_test.go:97

  • [nitpick] There is an inconsistency in scheme usage: the file imports both 'clientgoscheme' and 'scheme' from the same package. Consider using a single scheme reference for consistency across API registrations.
err = enterprisev4.AddToScheme(scheme.Scheme)

config/crd/kustomization.yaml:29

  • The patch for the ingestionclusters conversion webhook is commented out; please confirm if this is intentional or if it should be enabled to support resource conversion.
#- patches/webhook_in_ingestionclusters.yaml

config/crd/bases/enterprise.splunk.com_standalones.yaml:6

  • The controller-gen annotation has been downgraded from v0.16.1 to v0.14.0. Please verify that this change is intentional and that it does not affect CRD generation or compatibility.
controller-gen.kubebuilder.io/version: v0.14.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant