Skip to content

Refactor Tofu to be upgradable on pull #742

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 4 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -61,9 +61,9 @@ Run the following from the repository root to activate the venv:
Use the `cookiecutter` template to create a new environment to hold your configuration:

cd environments
cookiecutter skeleton
cookiecutter ../cookiecutter

and follow the prompts to complete the environment name and description.
and follow the prompts to complete the environment name and description, leaving `is_site_env` and `parent_site_env` as their defaults.

**NB:** In subsequent sections this new environment is referred to as `$ENV`.

Expand Down
2 changes: 1 addition & 1 deletion ansible/roles/block_devices/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ This is a convenience wrapper around the ansible modules:

To avoid issues with device names changing after e.g. reboots, devices are identified by serial number and mounted by filesystem UUID.

**NB:** This role is ignored[^1] during Packer builds as block devices will not be attached to the Packer build VMs. This role is therefore deprecated and it is suggested that `cloud-init` is used instead. See e.g. `environments/skeleton/{{cookiecutter.environment}}/tofu/control.userdata.tpl`.
**NB:** This role is ignored[^1] during Packer builds as block devices will not be attached to the Packer build VMs. This role is therefore deprecated and it is suggested that `cloud-init` is used instead. See e.g. `tofu/control.userdata.tpl`.

[^1]: See `environments/common/inventory/group_vars/builder/defaults.yml`

Expand Down
4 changes: 2 additions & 2 deletions ansible/roles/freeipa/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ Support FreeIPA in the appliance. In production use it is expected the FreeIPA s

## Usage
- Add hosts to the `freeipa_client` group and run (at a minimum) the `ansible/iam.yml` playbook.
- Host names must match the domain name. By default (using the skeleton OpenTofu) hostnames are of the form `nodename.cluster_name.cluster_domain_suffix` where `cluster_name` and `cluster_domain_suffix` are OpenTofu variables.
- Host names must match the domain name. By default (using the cookiecutter OpenTofu) hostnames are of the form `nodename.cluster_name.cluster_domain_suffix` where `cluster_name` and `cluster_domain_suffix` are OpenTofu variables.
- Hosts discover the FreeIPA server FQDN (and their own domain) from DNS records. If DNS servers are not set this is not set from DHCP, then use the `resolv_conf` role to configure this. For example when using the in-appliance FreeIPA development server:

```ini
Expand All @@ -28,7 +28,7 @@ Support FreeIPA in the appliance. In production use it is expected the FreeIPA s
- For production use with an external FreeIPA server, a random one-time password (OTP) must be generated when adding hosts to FreeIPA (e.g. using `ipa host-add --random ...`). This password should be set as a hostvar `freeipa_host_password`. Initial host enrolment will use this OTP to enrol the host. After this it becomes irrelevant so it does not need to be committed to git. This approach means the appliance does not require the FreeIPA administrator password.
- For development use with the in-appliance FreeIPA server, `freeipa_host_password` will be automatically generated in memory.
- The `control` host must define `appliances_state_dir` (on persistent storage). This is used to back-up keytabs to allow FreeIPA clients to automatically re-enrol after e.g. reimaging. Note that:
- This is implemented when using the skeleton OpenTofu; on the control node `appliances_state_dir` defaults to `/var/lib/state` which is mounted from a volume.
- This is implemented when using the cookiecutter OpenTofu; on the control node `appliances_state_dir` defaults to `/var/lib/state` which is mounted from a volume.
- Nodes are not re-enroled by a [Slurm-driven reimage](../../collections/ansible_collections/stackhpc/slurm_openstack_tools/roles/rebuild/README.md) (as that does not run this role).
- If both a backed-up keytab and `freeipa_host_password` exist, the former is used.

Expand Down
6 changes: 6 additions & 0 deletions cookiecutter/cookiecutter.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
{
"environment": "foo",
"description" : "Describe the environment here",
"is_site_env": false,
"parent_site_env": "None"
}
13 changes: 13 additions & 0 deletions cookiecutter/hooks/post_gen_project.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
import os
import sys

{% if cookiecutter.is_site_env == False %}
os.symlink("../../../tofu/layouts/main.tf", "tofu/main.tf")
os.symlink("../../../tofu/variables.tf", "tofu/variables.tf")
{% endif %}
{% if cookiecutter.parent_site_env != 'None' %}
if not os.path.isdir("../{{ cookiecutter.parent_site_env }}"):
print("ERROR: Parent environment {{ cookiecutter.parent_site_env }} does not exist")
sys.exit(1)
os.symlink("../../{{ cookiecutter.parent_site_env }}/tofu/{{ cookiecutter.parent_site_env }}.auto.tfvars","tofu/{{ cookiecutter.parent_site_env }}.auto.tfvars")
{% endif %}
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ stderr_callback = debug
gathering = smart
forks = 30
host_key_checking = False
inventory = ../common/inventory,inventory
inventory = ../common/inventory,{{ '../'+cookiecutter.parent_site_env+'/inventory,' if cookiecutter.parent_site_env != 'None' }}inventory
collections_path = ../../ansible/collections
roles_path = ../../ansible/roles
filter_plugins = ../../ansible/filter_plugins
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
# Environment {{ 'independent' if cookiecutter.is_site_env else 'specific' }} variables are set here
2 changes: 1 addition & 1 deletion docs/persistent-state.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ If using the `environments/common/layout/everything` Ansible groups template (wh

Note that if `appliances_state_dir` is defined, the path it gives must exist and should be owned by root. Directories will be created within this with appropriate permissions for each item of state defined above. Additionally, the systemd units for the services listed above will be modified to require `appliances_state_dir` to be mounted before service start (via the `systemd` role).

A new cookiecutter-produced environment supports persistent state in the default OpenTofu (see `environments/skeleton/{{cookiecutter.environment}}/tofu/`) by:
A new cookiecutter-produced environment supports persistent state in the default OpenTofu (see `cookiecutter/tofu` and the `./tofu` module) by:

- Defining a volume with a default size of 150GB - this can be controlled by the OpenTofu variable `state_volume_size`.
- Attaching it to the control node.
Expand Down
57 changes: 15 additions & 42 deletions docs/production.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,19 +15,20 @@ production-ready deployments.
A `dev` environment should also be created if considered required, or this
can be left until later.

These can all be produced using the cookicutter instructions, but the
`production` and `staging` environments will need their
`environments/$ENV/ansible.cfg` file modifying so that they point to the
`site` environment:

```ini
inventory = ../common/inventory,../site/inventory,inventory
```
These can all be produced with cookiecutter by running
```sh
cd environments
cookiecutter ../cookiecutter
```
You should ensure that you set the `is_site_env` prompt to true for your `site` environment
and set the `parent_site_env` to `site` for your `production` and `staging` environments

- To avoid divergence of configuration all possible overrides for group/role
vars should be placed in `environments/site/inventory/group_vars/all/*.yml`
unless the value really is environment-specific (e.g. DNS names for
`openondemand_servername`).
`openondemand_servername`). It is therefore recommended that you delete the initial
`environments/{production/staging}/inventory/group_vars/all/*.yml` files in your `production` and
`staging` environments.

- Where possible hooks should also be placed in `environments/site/hooks/`
and referenced from the `site` and `production` environments, e.g.:
Expand All @@ -38,38 +39,10 @@ and referenced from the `site` and `production` environments, e.g.:
import_playbook: "{{ lookup('env', 'APPLIANCES_ENVIRONMENT_ROOT') }}/../site/hooks/pre.yml"
```

- OpenTofu configurations should be defined in the `site` environment and used
as a module from the other environments. This can be done with the
cookie-cutter generated configurations:
- Delete the *contents* of the cookie-cutter generated `tofu/` directories
from the `production` and `staging` environments.
- Create a `main.tf` in those directories which uses `site/tofu/` as a
[module](https://opentofu.org/docs/language/modules/), e.g. :

```
...
variable "environment_root" {
type = string
description = "Path to environment root, automatically set by activate script"
}

module "cluster" {
source = "../../site/tofu/"
environment_root = var.environment_root

cluster_name = "foo"
...
}
```

Note that:

- Environment-specific variables (`cluster_name`) should be hardcoded
into the cluster module block.
- Define OpenTofu configurations
- Environment-specific variables (e.g `cluster_name`) should be set in `environments/$ENV/tofu/{$ENV}.tfvars`
- Environment-independent variables (e.g. maybe `cluster_net` if the
same is used for staging and production) should be set as *defaults*
in `environments/site/tofu/variables.tf`, and then don't need to
be passed in to the module.
same is used for staging and production) should be set in `environments/site/tofu/site.auto.tfvars`

- Vault-encrypt secrets. Running the `generate-passwords.yml` playbook creates
a secrets file at `environments/$ENV/inventory/group_vars/all/secrets.yml`.
Expand Down Expand Up @@ -103,8 +76,8 @@ and referenced from the `site` and `production` environments, e.g.:
state_volume_provisioning = "attach"

either for a specific environment within the cluster module block in
`environments/$ENV/tofu/main.tf`, or as the site default by changing the
default in `environments/site/tofu/variables.tf`.
`environments/$ENV/tofu/{$ENV}.tfvars`, or as the site default by changing the
default in `environments/site/tofu/site.auto.tfvars`.

For a development environment allowing OpenTofu to manage the volumes using
the default value of `"manage"` for those varibles is usually appropriate, as
Expand Down
46 changes: 45 additions & 1 deletion docs/upgrades.md
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,8 @@ All other commands should be run on the Ansible deploy host.

1. If required, build an "extra" image with local modifications, see [docs/image-build.md](./image-build.md).

1. Modify your site-specific environment to use this image, e.g. via `cluster_image_id` in `environments/$SITE_ENV/tofu/variables.tf`.
1. Modify your site-specific environment to use this image, e.g. via `cluster_image_id` in `environments/site/tofu/site.auto.tfvars` if all environments use the same image
or `environments/$SUB_ENV/tofu/{$SUB_ENV}.tfvars` if the image is environment-specific.

1. Test this in your staging cluster.

Expand Down Expand Up @@ -101,3 +102,46 @@ playbook to reconfigure the cluster, e.g. as described in the main [README.md](.

1. Tell users the cluster is available again.

## Upgrading OpenTofu

As of v2.3, environments now import the appliance's latest Tofu as a module, ensuring that your Tofu infrastructure is up to date
with the configuration expected upstream. Environment defined before v2.3 must therefore be manually migrated to the new model.

### Upgrading Site Environments

1. Identify any custom defaults you have set in your `variables.tf` file with `diff environments/site/tofu/variables.tf tofu/variables.tf`

1. Create a new `environments/site/tofu/site.auto.tfvars` file and assign any variables you previously set custom defaults for in `variables.tf` with their default value. For
example,
```sh
variable "key_pair" {
type = string
description = "Name of an existing keypair in OpenStack"
default = "my-key"
}
```
in `variables.tf` becomes
```sh
key_pair = "my-key"
```
in `site.auto.tfvars`

1. Delete the contents of the `environments/site/tofu` except for the `site.auto.tfvars` file

### Upgrading Production/Staging Environments

1. In the `environments/$ENV_NAME/tofu/main.tf` file of your environment, identify any variables (other than `environment_root`) you have hardcoded as arguments to your site module

1. Move these variable assignments to a new `environments/$ENV_NAME/tofu/$ENV_NAME.tfvars` file

1. Delete the contents of the `environments/$ENV_NAME/tofu` directory, except for `.terraform`, `terraform.tfstate`, `.terraform.lock.hcl` and the new tfvars file

1. Create a symlink from `tofu/layouts/main.tf` to `environments/$ENV_NAME/tofu/main.tf`

1. Create a symlink from `tofu/variables.tf` to `environments/$ENV_NAME/tofu/main.tf`

1. Create a symlink from `environments/site/tofu/site.auto.tfvars` to `environments/$ENV_NAME/tofu/site.auto.tfvars`

1. Import the new module with `tofu init`

1. Verify no destructive changes were made to your existing infrastructure with `tofu plan -var-file=$YOUR-TFVARS-FILE`
2 changes: 1 addition & 1 deletion environments/.caas/inventory/group_vars/all/nfs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ caas_nfs_home:
nfs_enable:
server: "{{ inventory_hostname in groups['control'] }}"
clients: "{{ inventory_hostname in groups['cluster'] }}"
nfs_export: "/exports/home" # assumes skeleton TF is being used
nfs_export: "/exports/home" # assumes cookiecutter TF is being used
nfs_client_mnt_point: "/home"

nfs_configurations: "{{ caas_nfs_home if not cluster_home_manila_share | bool else [] }}"
4 changes: 2 additions & 2 deletions environments/.stackhpc/tofu/main.tf
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# This terraform configuration uses the "skeleton" terraform, so that is checked by CI.
# This terraform configuration uses the ./tofu terraform, so that is checked by CI.

terraform {
required_version = ">= 0.14"
Expand Down Expand Up @@ -59,7 +59,7 @@ data "openstack_images_image_v2" "cluster" {
}

module "cluster" {
source = "../../skeleton/{{cookiecutter.environment}}/tofu/"
source = "../../../tofu/"

cluster_name = var.cluster_name
cluster_networks = var.cluster_networks
Expand Down
10 changes: 4 additions & 6 deletions environments/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,17 +35,15 @@ Shared configuration for all environments. This is not
intended to be used as a standalone environment, hence the README does *not* detail
how to provision the infrastructure.

### skeleton

Skeleton directory that is used as a template to create a new environemnt.

## Defining an environment

To define an environment using cookiecutter:

cookiecutter skeleton
cd environments
cookiecutter ../cookiecutter

This will present you with a series of questions which you must answer.
This will present you with a series of questions which you must answer. For guidance on setting
`is_site_env` and `parent_site_env`, see [production docs](../docs/production.md).
Once you have answered all questions, a new environment directory will
be created. The directory will be named according to the answer you gave
for `environment`.
Expand Down
2 changes: 1 addition & 1 deletion environments/common/inventory/group_vars/all/nfs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ nfs_configuration_home_volume: # volume-backed home directories
# Don't mount share on control node:
clients: "{{ inventory_hostname in groups['cluster'] and inventory_hostname not in groups['control'] }}"
nfs_server: "{{ nfs_server_default }}"
nfs_export: "/exports/home" # assumes skeleton TF is being used
nfs_export: "/exports/home" # assumes cookiecutter TF is being used
nfs_client_mnt_point: "/home"
# prevent tunnelling and setuid binaries:
# NB: this is stackhpc.nfs role defaults but are set here to prevent being
Expand Down
4 changes: 0 additions & 4 deletions environments/skeleton/cookiecutter.json

This file was deleted.

This file was deleted.

File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
30 changes: 30 additions & 0 deletions tofu/layouts/main.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
module "cluster" {
source = "../../../tofu"
environment_root = var.environment_root

cluster_name = var.cluster_name
cluster_domain_suffix = var.cluster_domain_suffix
cluster_networks = var.cluster_networks
key_pair = var.key_pair
control_ip_addresses = var.control_ip_addresses
control_node_flavor = var.control_node_flavor
login = var.login
cluster_image_id = var.cluster_image_id
compute = var.compute
additional_nodegroups = var.additional_nodegroups
state_dir = var.state_dir
state_volume_size = var.state_volume_size
state_volume_type = var.state_volume_type
state_volume_provisioning = var.state_volume_provisioning
home_volume_size = var.home_volume_size
home_volume_type = var.home_volume_type
home_volume_provisioning = var.home_volume_provisioning
vnic_types = var.vnic_types
login_security_groups = var.login_security_groups
nonlogin_security_groups = var.nonlogin_security_groups
volume_backed_instances = var.volume_backed_instances
root_volume_size = var.root_volume_size
root_volume_type = var.root_volume_type
gateway_ip = var.gateway_ip
cluster_nodename_template = var.cluster_nodename_template
}
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.