Wallaby updates

Information about the Wallaby configuration service for Condor pools.

Authorization for Wallaby clients

Wallaby 0.16.0, which updates the Wallaby API version to 20101031.6, includes support for authorizing broker users with various roles that can interact with Wallaby in different ways. This post will explain how the authorization support works and show how to get started using it. If you just want to get started using Wallaby with authorization support as quickly as possible, skip ahead to the section titled “Getting Started” below. Detailed information about which role is required for each Wallaby API method is available here.

Overview

Users must authenticate to the AMQP broker before using Wallaby (although some installations may allow users to authenticate as “anonymous”), but previous versions of Wallaby implicitly authorized any user who had authenticated to the broker to perform any action. Wallaby now includes a database mapping from user names to roles, which allows installations to define how each broker user can interact with Wallaby. Each method is annotated with the role required to invoke it, and each method invocation is checked to ensure that the currently-authenticated user is authorized to assume the role required by the method. The roles Wallaby recognizes are NONE, READ, WRITE, or ADMIN, where each role includes all of the capabilities of the role that preceded it.

If WALLABY_USERDB_NAME is set in the Wallaby agent’s environment upon startup and represents a valid pathname, Wallaby will use that as the location of the user-role database. If this variable is set to a valid pathname but no file exists at that pathname, the Wallaby user-role database will be created upon agent startup. If WALLABY_USERDB_NAME is not set, the user-role database will be initialized in memory only and thus will not persist across agent restarts.

Standard authorization

When Wallaby is about to service an API request, it:

  1. checks the role required to invoke the method.

  2. checks the authorization level specified for the user. There are several possibilities under which a user could be authorized to invoke a method:

    • the user is explicitly authorized for a role that includes the required role (e.g. the user has an ADMIN role but the method only requires READ);
    • the user is implicitly authorized for a role that includes the required role (e.g. there is an entry for the wildcard user * giving it READ access and the method requires READ access)
    • the role database is empty, in which case all authenticated users are implicitly authorized for all actions (this is the same behavior as in older versions of Wallaby)
    • the invocation is of a user-role database maintenance method and the client is authorized via shared secret (see below)
  3. if none of the conditions of the above step hold, the method invocation is unauthorized and fails with an API-level error. If the API method is invoked over the Ruby client library, it will raise an exception. If it is invoked via a wallaby shell command-line tool, it will print a human-readable error message and exit with a nonzero exit status.

  4. if the user is authorized to invoke the method, invocation proceeds normally.

Authorization with secret-based authentication

This version of the Wallaby API introduces three new methods: Store#set_user_privs, Store#del_user, and Store#users. These enable updating and reading the user-role database; the first two require ADMIN access, while the last requires READ access. Because changes in the user-role database may result in an administrator inadvertently removing administrator rights from his or her broker user, Wallaby provides another mechanism to authorize access to these methods. Each of these three methods supports a special secret option in its options argument. When the Wallaby service starts up, it loads a secret string from a file. Clients that supply the correct secret as an option to one of these calls will be authorized to invoke these calls, even if the broker user making the invocation is not authorized by the user-role database.

The pathname to the secret file is given by the environment variable WALLABY_SECRET_FILE. If this variable is unset upon agent startup, Wallaby will not use a shared secret (and secret-based authorization will not be available to API clients). It this variable is set and names an existing file that the Wallaby agent user can read, the Wallaby shared secret will be set to the entire contents of this file. If this variable is set and names a nonexistent file in a path that does exist, Wallaby will create a file at this path upon startup with a randomly-generated secret (consisting of a digest hash of some data read from /dev/urandom). If this variable is set to a pathname that includes nonexistent directory components, the Wallaby agent will raise an error. If you create your own secret file, ensure that it is only readable by the UNIX user that the Wallaby agent runs as (typically wallaby).

Caveats

The Wallaby agent’s authorization support is designed to prevent broker users from altering Condor pool configurations in excess of their authority. It is not intended to keep all configuration data strictly confidential. (This is not as bad as it might sound, since Wallaby-generated configurations are available for inspection by Condor users.) Furthermore, due to technical limitations, it is not possible to protect object property accesses over the API with the same authorization support that we use for API method invocations. Therefore, if concealing configuration data from some subset of users is important for your installation, you should prevent these users from authenticating to the broker that the Wallaby agent runs on.

Getting started

Here is a quick overview of how to get started with auth-enabled Wallaby:

  1. Stop your running Wallaby and restart your broker before starting the new Wallaby (this is necessary to pick up the new API methods). Set WALLABY_USERDB_NAME in your environment to a path where you can store the user-role database. Install and start your new Wallaby.
  2. If you’re using the RPM package, it will create a “secret file” for you in /var/lib/wallaby/secret. If not, you will need to set WALLABY_SECRET_FILE in the environment to specify a location for this secret file and then restart Wallaby. The Wallaby secret is a special token that can be passed to certain API methods (specifically, those related to user database management) in order to authorize users who aren’t authorized in the user database.
  3. Try using some of the new shell commands: wallaby set-user-role, wallaby list-users, and wallaby delete-user.
  4. Make sure that you have a secret in your secret file. Make a note of it. Try setting the role for your current broker user to READ or NONE (e.g. “wallaby set-user-role anonymous NONE”) and then see what happens when you try and run some other Wallaby shell commands. You can recover from this by passing the Wallaby secret to “wallaby set-user-role”; see its online help for details.

The default user database is empty, which will result in the same behavior as in older versions of Wallaby (viz., all actions are available to all broker users), but only until a user role is added, at which point all actions must be explicitly or implicitly authorized.

This article is cross-posted from Chapeau

Highly-available configuration data with Wallaby

Many Condor users are interested in high-availability (HA) services: they don’t want their compute resources to become unavailable due to the failure of a single machine that is running an important Condor daemon. (See this talk that Rob Rati and I gave at Condor Week this year for a couple of solutions to HA with the Condor schedd.) So it’s only natural that Condor users who are interested in configuring their pools with Wallaby might wonder how Wallaby responds in the face of failure.

Some of the technologies that the current version of Wallaby is built upon do not lend themselves to traditional active-active high-availability solutions. However, the good news is that due to Wallaby’s architecture, almost all running Condor nodes will not be affected by a failure of the Wallaby service or the machine it is running on. Nodes that have already checked in with Wallaby will have their latest activated configurations as of the last checking. The only limitations in the event of service failure are:

  1. new nodes will not be able to check in and get the default configuration;
  2. it will not be possible to access older activated configurations; and
  3. it will not be possible to alter, activate, or deploy the configuration.

These limitations, of course, disappear when the service is restored. For most users, losing the ability to update or deploy configurations due to a service failure — but not losing their deployed configurations or otherwise affecting normal pool operation — represents an acceptable risk. Some installations, especially those who aggressively exploit Wallaby’s scripting interface or versioning capability, may want a more robust solution: these users might want to be able to access older versions of their activated configurations even if Wallaby is down, or they might want a mechanism to speed recovery by starting a replica of their service on another machine. In the remainder of this post, we’ll discuss some approaches to provide more access to Wallaby data when Wallaby is down.

Accessing older versioned configurations

If you need to access historical versioned configurations, the easiest way to do it is to set up a cron job on the machine running Wallaby that periodically runs wallaby vc-export on your snapshot database and outputs versioned configurations to a shared filesystem. wallaby vc-export, which is documented in this post, exports all historical snapshots to plain-text files, so you can access the configuration for foo.example.com at version 1234 in a file called something like 1234/nodes/foo.example.com. This cron job needs to be able to access the filesystem path of the Wallaby snapshot database; furthermore, to run it efficiently, you’ll probably want to limit the number of snapshots it processes each time; see wallaby vc-export’s online help for more details.

Exporting Wallaby state to a file

If you’re just interested in the state of the Wallaby service (including possibly unactivated changes), you can periodically run wallaby dump over the network. This will produce a YAML file consisting of the serialized state of the Wallaby; you can later load this file by using the wallaby load command, possibly against another Wallaby agent.

Backing up the raw database files

The easiest way to pick up and recover from a Wallaby node failure is to start a new Wallaby service with the same databases as the failed node. In turn, the easiest way to do this is by periodically copying these database files from their locations on the Wallaby node (typically in /var/lib/wallaby) to some location on a shared filesystem. The following Ruby script will safely copy the SQLite files that Wallaby uses even while Wallaby is running:

backup-db.rb script for live backup of Wallaby databases
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
#!/usr/lib/env ruby

# Acquires a shared lock on a SQLite database file and copies it to a backup
# usage:  backup-db.rb DBFILE.db BACKUPFILE.db
# author:  William Benton (willb@redhat.com)
# Public domain.

require 'sqlite3'
require 'fileutils'

def backup_db(db_file, backup_file)
  begin
    db = SQLite3::Database.new(db_file)

    db.transaction(:immediate) do |ignored|
      # starting a transaction with ":immediate" means we get a shared lock
      # and thus any db writes (unlikely!) complete before we copy the file
      FileUtils.cp(db_file, backup_file, :preserve=>true)
    end
  ensure
    db.close
  end
end

backup_db(ARGV[0], ARGV[1])

(cross-posted from Chapeau)

Using the skeleton group

In Wallaby, Condor nodes are configured by applying features and parameter settings to groups. In order for the group abstraction to be fully general, Wallaby provides two kinds of special groups: the default group, which contains every node (but which is the lowest-priority membership for each node), and a set of identity groups, each of which only contains a single node (and which is always its highest-priority membership, so that special settings applied to a node’s identity group always take precedence over settings from that node’s other memberships).

The default group provides a convenient mechanism to apply some configuration to every node in a pool, whether or not it has been explicitly configured; this enables us to provide some sensible basic configuration for nodes that are added to the pool opportunistically or from the cloud. However, for some applications, using the default group may be inflexible because of Wallaby’s additive, compositional configuration model. For example, what if you want to enable some functionality by default but disable it on some subset of nodes? Or what if you want to use “defaults” to override some settings made in explicit groups?

The skeleton group provides a solution to these problems. (The “skeleton group” is so named by analogy with the /etc/skel directory on Linux systems, which provides a template for the home directories of newly-created users.) Like the default group, every new node is added to the skeleton group when it is created and receives the last-activated configuration for the skeleton group. Unlike the default group, individual nodes’ skeleton group memberships can be at different priorities; furthermore, nodes can be removed from the skeleton group all together. In this post, we’ll see how to set it up and use it.

  1. Ensure that you’re running Wallaby 0.15.0 or later. (Some older versions also include experimental support for the skeleton group, but Wallaby 0.15.0 eliminates many rough edges and is the recommended minimum version for using the skeleton group.)
  2. Change the environment that your wallaby-agent runs in so that WALLABY_ENABLE_SKELETON_GROUP is set to true. The easiest way to do this is to modify the /etc/sysconfig/wallaby-agent file (on RHEL 5 and 6) or the /etc/sysconfig/wallaby-agent-env file (on Fedora).
  3. Restart Wallaby by running service wallaby restart (on RHEL 5 and 6) or systemctl restart wallaby.service (on Fedora).
  4. Create a new node, either explicitly (e.g. wallaby add-node foo.local.) or by having a new node check in; note that its memberships include +++SKEL. This is the skeleton group. (Preexisting nodes won’t be in the skeleton group by default, but it’s straightforward to add them all via the Wallaby shell.)

Now that you have it set up, here are some things to try doing with it:

  1. Put some of your default-group configuration in the skeleton group. Note that the skeleton group will be validated in isolation just as the default group is, but the validation procedure for the skeleton group assumes that the default group configuration has also been applied already — so your skeleton group configuration will need to be valid when applied atop the default group configuration. (This is so that newly-created nodes will have a valid configuration.)
  2. Put configuration that you want to override on some nodes in the skeleton group. Then remove these nodes from the skeleton group.
  3. Experiment with configurations that depend on the priority of the skeleton group; move it around in nodes’ membership lists.
  4. Write a Wallaby shell command to copy the configuration from the skeleton group over to a node’s identity group and then remove that node from the skeleton group (in order to allow further customization).

Let us know how you wind up using this functionality!

Troubleshooting Condor with Wallaby

Often, if you’re trying to reproduce a problem someone else is having with Condor, you’ll need their configuration. Likewise, if you’re trying to help someone reproduce a problem you’re having, you’ll want to send along your configuration to aid them in replicating your setup. For installations that use legacy flat-file configurations (optionally with a local configuration directory), this can be a pain, since you’ll need to copy several files from site to site (ensuring that you’ve included all the files necessary to replicate your configuration, perhaps across multiple machines on the site experiencing the problem).

If everyone involved uses Wallaby for configuration management, things can be a lot simpler: the site experiencing the problem can use wallaby dump to save the state of their configuration for an entire pool to a flat file, which the troubleshooting site can then inspect or restore with wallaby load. If the problem appears in some configuration snapshots but not in others, the reporting site can use wallaby vc-export to generate a directory of all of their configurations over time, so that the troubleshooting site can attempt to pinpoint the differences between what worked and what didn’t.

(Thanks to Matt for pointing out the value of versioned semantic configuration management in reproducing problems!)

Now powered by OctoPress

I noticed yesterday that this site had recently fallen victim to yet another WordPress security vulnerability; please accept my apologies if you’ve seen any unexpected behavior here. In other news, I’m pleased to announce that getwallaby.com is now powered by OctoPress. Please let me know if you find any quirks with the new site.

Exporting versioned configurations

Wallaby stores versioned configurations in a database. Wallaby API clients can access older versions of a node’s configuration by supplying the version option to the Node#getConfig method. Sometimes, though, we’d like to inspect individual configurations in greater detail than the API currently allows.

The Wallaby git repository now contains a command to export versioned configurations from a database to flat text files. Clone the repository or just download the file and then place cmd_versioned_config_export.rb somewhere in your WALLABY_SHELL_COMMAND_PATH, and you’ll be able to invoke it like this:

Usage:  wallaby vc-export DBFILE
exports versioned configurations from DBFILE to plain text files
    -h, --help                       displays this message
        --verbose                    outputs information about command progress
    -o, --output-dir DIR             set output directory to DIR
        --earliest NUM               output only the earliest NUM configurations
        --latest NUM                 output only the most recent NUM configurations
        --since DATE                 output only configurations since the given date

It will create a directory (called snapshots by default) with subdirectories for each versioned configuration; each of these will be timestamped with the time of the configuration. Within each version directory, there will be directories for node configurations and stored group configurations. (If you’re using an older version of Wallaby, the only stored group configuration will be for the default group. Versioned configurations generated with a fairly recent version of Wallaby, on the other hand, will have stored configurations for more groups and should also have useful information about node memberships in the configuration file.)

wallaby vc-export allows you both to back up versioned configurations and simply to inspect them. Let us know if you find a new application for it, or if you find it useful in other ways!

Node tagging in a Wallaby client library

In an earlier post, I presented a technique for adding node tagging to Wallaby without adding explicit tagging support to the Wallaby API. Node tags are useful for a variety of reasons: they can correspond to informal user-supplied classifications of nodes or machine-generated system attributes (e.g. “64-bit”, “high-memory”, “infiniband”). Since we implemented tags as special Wallaby groups, they may contain configuration information (although they don’t need to) and will be advertised as one of the machine’s Condor attributes.

The Wallaby source repository now includes a small Python module that patches the Wallaby python client library with idiomatic tagging support: namely, you can ask the store for the name of the group that represents the partition between tags and groups, and you can call getTags() and modifyTags() methods on Node objects, as well as inspect the tags property of a Node. This will appear in Wallaby packages in the future, but you don’t have to be that adventurous to try it out now! To use it, clone the repository and put extensions/tagging.py and schema/wallaby.py somewhere in your PYTHONPATH. Then follow along with this transcript:

tagging-example.py Fork this example on GitHub
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
import wallaby

# the "tagging" module patches the Wallaby client library with 
# support for tag operations
import tagging

# We'll start by setting up a Wallaby client library session against
# the broker on localhost
from qmf.console import Session
console = Session()
console.addBroker()
raw_store, = console.getObjects(_class="Store")
store = wallaby.Store(raw_store, console)

# call this method after the store client is initialized so that
# the tagging library knows how to create missing groups
tagging.setup(store)

# Clear out all of the memberships on a fake node
store.getNode("fake-node.example.com").modifyMemberships("REPLACE", [], {})

# After clearing a node's membership list, it will have no tags
store.getNode("fake-node.example.com").getTags()

# You can also access the tags of a given node via its "tags" attribute
store.getNode("fake-node.example.com").tags

# by convention, we're preceding tags with "@" here
store.getNode("fake-node.example.com").modifyTags("ADD", ["@Foo", "@Blitz"], create_missing_tags=True)

# This should return ["@Foo", "@Blitz"]
store.getNode("fake-node.example.com").tags

# This should return ["===TAGS_BELOW===", "@Foo", "@Blitz"]
store.getNode("fake-node.example.com").memberships

I appreciate any feedback or comments on this library.

(Cross-posted from Chapeau.)

Using Wallaby groups to implement node tagging

One of the great things about Wallaby is that it’s a platform, not merely a tool. Put another way, if it doesn’t do exactly what you want, you can use its API to build tools that benefit from configuration validation and deployment. We’ve talked in the past about a number of useful tools built on the Wallaby API. (Another cool Wallaby API client that I hope to talk about more in the future is Erik Erlandson’s Albatross project, which programmatically generates and changes pool configurations in order to test Condor scale and functionality.)

The Wallaby API is designed to be sufficiently general to allow developers to do just about anything with configuration data, not to unnecessarily restrict users to a few use cases that we thought of. Because of this generality, some tasks might require adopting application-level conventions. In this article, we’ll cover one such convention and see how the Wallaby API is flexible enough to handle an interesting use case – namely, tagging nodes with various keywords, perhaps as supplied by a user or generated by an agent like sesame or matahari.

First, though, we’ll review how configurations are generated and applied to nodes. Recall that a node is a member of several groups. These memberships are ordered: a node’s lowest-priority membership is always in the special default group (which includes every node), and its highest-priority membership is always in a special, node-specific identity group (which only includes a single node). Zero or more memberships in explicit groups may occupy the priority space between the default group and a node’s identity group. In the illustration below, node.local is a member of two explicit groups, which have blue backgrounds: “EC2 submit nodes,” and “Execute nodes.”

Wallaby groups simple

When Wallaby calculates the configuration for node.local., it will begin with a copy of the default group’s configuration. It will then repeatedly apply the configurations for the explicit groups and identity group in order of increasing priority, so that parameter-value mappings from higher-priority groups take precedence over lower-priority ones (and thus either replace these or are appended to them, depending on the mapping kind). The condor_configd, which takes a node’s configuration from Wallaby and installs it on a node, will also cause the Wallaby groups a node is a member of (as well as the Wallaby features it has installed) to be advertised for use in Condor matchmaking. So a Condor job could specify that it wanted to match against a node that was a member of the “EC2 submit nodes” group; this would translate into a preference for node.local.

Because group names appear in machine ClassAd attributes, the list of explicit node memberships is a natural place to put tag information. Nodes would simply be members of “dummy” groups like “Desktop workstations” or “Machines with more than 8GB RAM,” and these keywords could be used in matchmaking or in searching for particular nodes. However, the list of explicit groups is not necessarily suitable for automatic manipulation: users will not necessarily expect their changes to be overridden, other API clients won’t necessarily expect users to rearrange or remove groups corresponding to tags, and in general it is impossible to determine whether a group membership should be interpreted as a tag or as an explicit membership.

We can adopt a convention to partition the space of group memberships. Say we create a special sentinel group to partition the membership space: every node will be a member of the sentinel group. All memberships that are of a lower priority than the sentinel group will be managed by tagging agents, and all memberships that are of a higher priority will be explicitly managed by the user. In the example below, “–EXPLICIT GROUPS” is the sentinel group, and groups in yellow correspond to tags.

Wallaby memberships, with a sentinel group and tag groups

This approach demonstrates the generality of the Wallaby API, and allows users to supply tag-specific configurations by installing parameters or features on tag groups, for example, ensuring that every desktop workstation has a policy that favors its owner, or overprovisioning high-memory nodes. However, it is not free of shortcomings. Firstly, Wallaby will present all of these groups – sentinel, tag, and explicit groups – so a user-facing tool that puts a friendly interface on node memberships and tagging will need to use the sentinel group to partition which groups it displays to the user. Secondly, the sentinel groups must be added to all nodes (in practice, we can assume that the absence of a sentinel group implies the absence of tags); in general, we are relying on clients to enforce invariants in this case. Finally, altering a node’s membership list to insert a tag will cause that node’s configuration to be validated at the next activation. If the node has a complex configuration, this could be expensive.

(Cross-posted from Chapeau.)