Monday, September 29, 2014

Docker: Re-using a custom base image - Pulp Resource Manager image.

Here's the next step in the ongoing saga of containerizing the Pulp service in Docker for use with Kubernetes.

In the last post I spent a bunch of effort creating a base image for a set of Pulp service components. Then I only implemented one, the Celery beat server.  In this (hopefully much shorter) post I'll create a second image from that base.  This one is going to be the Pulp Resource Manager service.

A couple of recap pieces to start.

The Pulp service is made up of several independent processes that communicate using AMQP messaging (through a QPID message bus) and by access to a MongoDB database.  The QPID services and the MongoDB services are entirely independent of the Pulp service processes and communicate only over TCP/IP.    There are also a couple of processes that are tightly coupled, both requiring access to shared data.  These will come later.  What's left is the Pulp Resource Manager process and the Pulp Admin REST service.

I'm going to take these in two separate posts to make them a bit more digestible than the last one was.

Extending the Base - Again


As in the case with the Pulp Beat service, the Resource Manager process is a singleton. Each pulp service has exactly one.  (Discussions of HA and SPOF will be held for later). The Resource Manager process communicates with the other components solely through the QPID message broker and the MongoDB over TCP.  There is no need for persistent storage.

In fact the only difference between the Beat service and the Resource Manager is the invocation of the Celery service.  This means that the only difference between the Docker specifications is the name and two sections of the run.sh file.

The Dockerfile is in fact identical in content to that for the Pulp Beat container:

Now to the run.sh script.


The first difference in the run.sh is simple. The Beat service is used to initialize the database. The Resource Manager doesn't have to do that.

The second is also pretty simple: The exec line at the end starts the Celery service use the resource_manager entry point instead of the beat service.

I do have one other note to myself.  It appears that the wait_for_database() function will be needed in every derivative of the pulp-base image.  I should probably refactor that but I'm not going to do it yet.

One Image or Many?


So, if I hadn't been using shell functions, this really would come down to two lines different between the two. Does it really make sense to create two images?  It is possible to pass a mode argument to the container on startup.  Wouldn't that be simpler?

It actually might be. It is possible to use the same image and pass an argument.  The example from which mine are derived used that method.

I have three reasons for using separate images.  One is for teaching and the other two are development choices. Since one of my goals is to show how to create custom base images and then use derived images to create customizations I used this opportunity to show that.

The deeper reasons have to do with human nature and the software development life cycle.

People expect to be able to compose service by grabbing images off the shelf and plugging them together. Adding modal switches to the images means that they are not strongly differentiated by function. You can't just say "Oh, I need 5 functional parts, let me check the bins".  You have to know more about each image than just how it connects to others.  You have to know that this particular image can take more than one role within the service.  I'd like to avoid that if I can. Creating images with so little difference feels like inefficiency, but only when viewed from the standpoint of the person producing the images.  To the consumer it maintains the usage paradigm. Breaks in the paradigm can lead to mistakes or confusion.

The other reason to use distinct images has to do with what I expect and hope will be a change in the habits of software developers.

Developers of complex services currently feel a tension, when they are creating and packaging their software, between putting all of the code, binaries and configuration templates into a single package.  You only create a new package if the function is strongly different. This makes it simpler to install the software and configure it once.  On traditional systems where all of the process components would be running on the same host there was no good reason to separate the code for distinct processes based on their function. There are clear cases where the separation does happen in host software packaging, notably in client and server software.  These clearly will run on different hosts.  Other cases though are not clear cut.

The case of the Pulp service is in a gray area.  Much of the code is common to all four Celery based components (beat, resource manager, worker and admin REST service).  It is likely possible to refactor the unique code into separate packages for the components, though the value is questionable at this point.

I want to create distinct images because it's not very expensive, and it allows for easy refactoring should the Pulp packaging ever be decomposed to match the actual service components.  Any changes would happen when the new images are built, but the consumer would not need to see any change.  This is a consideration to keep in mind when ever I create a new service with different components from the same service RPM.

Running and Verifying the Resource Manager Image


The Pulp Resource Manager process makes the same connections that the Pulp Beat process does.  It's a little harder to detect the Resource Manager access to the database since the startup doesn't make radical changes like the DB initialization.  I'm going to see if I can find some indications that the resource manager is running though. The QPID connection will be much easier to detect. The Resource Manager creates its own set of queues which will be easy to see.

The resource manager requires the database service and an initialized database.  Testing this part will start where the previous post left off, with running QPID and MongoDB and with the Pulp Beat service active.

NOTE: there's currently (20140929) a bug in Kubernetes where, during the period between waiting for the image to download and when it actually starts, kubecfg list pods will indicate that the pods have terminated.  If you see this, give it another minute for the pods to actually start and transfer to the running state.

Testing in Docker


All I need to do using Docker directly is to verify that the container will start and run. The visibility in Kubernetes still isn't up to general dev and debugging.

docker run -d --name pulp-resource-manager \
  -v /dev/log:/dev/log \
  -e PULP_SERVER_NAME=pulp.example.com \
  -e SERVICE_HOST=10.245.2.2 \
  markllama/pulp-resource-manager
0e8cbc4606cf8894f8be515709c8cd6a23f37b3a58fd84fecf0d8fca46c64eed

 docker ps
CONTAINER ID        IMAGE                                    COMMAND             CREATED             STATUS              PORTS               NAMES
0e8cbc4606cf        markllama/pulp-resource-manager:latest   "/run.sh"           9 minutes ago       Up 9 minutes                            pulp-resource-manager

Once it's running I can check the logs to verify that everything has started as needed and that the primary process has been executed at the end.

docker logs pulp-resource-manager
+ '[' '!' -x /configure_pulp_server.sh ']'
+ . /configure_pulp_server.sh
++ set -x
++ PULP_SERVER_CONF=/etc/pulp/server.conf
++ export PULP_SERVER_CONF
++ PULP_SERVER_NAME=pulp.example.com
++ export PULP_SERVER_NAME
++ SERVICE_HOST=10.245.2.2
++ export SERVICE_HOST
++ DB_SERVICE_HOST=10.245.2.2
++ DB_SERVICE_PORT=27017
++ export DB_SERVICE_HOST DB_SERVICE_PORT
++ MSG_SERVICE_HOST=10.245.2.2
++ MSG_SERVICE_PORT=5672
++ MSG_SERVICE_USER=guest
++ export MSG_SERVICE_HOST MSG_SERVICE_PORT MSG_SERVICE_NAME
++ check_config_target
++ '[' '!' -f /etc/pulp/server.conf ']'
++ configure_server_name
++ augtool -s set '/files/etc/pulp/server.conf/target[. = '\''server'\'']/server_name' pulp.example.com
Saved 1 file(s)
++ configure_database
++ augtool -s set '/files/etc/pulp/server.conf/target[. = '\''database'\'']/seeds' 10.245.2.2:27017
Saved 1 file(s)
++ configure_messaging
++ augtool -s set '/files/etc/pulp/server.conf/target[. = '\''messaging'\'']/url' tcp://10.245.2.2:5672
Saved 1 file(s)
++ augtool -s set '/files/etc/pulp/server.conf/target[. = '\''tasks'\'']/broker_url' qpid://guest@10.245.2.2:5672
Saved 1 file(s)
+ '[' '!' -x /test_db_available.py ']'
+ wait_for_database
+ DB_TEST_TRIES=12
+ DB_TEST_POLLRATE=5
+ TRY=0
+ '[' 0 -lt 12 ']'
+ /test_db_available.py
Testing connection to MongoDB on 10.245.2.2, 27017
+ '[' 0 -ge 12 ']'
+ start_resource_manager
+ exec runuser apache -s /bin/bash -c '/usr/bin/celery worker -c 1 -n resource_manager@pulp.example.com --events --app=pulp.server.async.app --umask=18 --loglevel=INFO -Q resource_manager --logfile=/var/log/pulp/resource_manager.log'

If you fail to see it start especially with "file not found" or "no access" errors, check the /dev/log volume mount and the SERVICE_HOST value.

I also want to check that the QPID queues have been created.


qpid-config queues -b guest@10.245.2.4
Queue Name                                       Attributes
======================================================================
04f58686-35a6-49ca-b98e-376371cfaaf7:1.0         auto-del excl 
06fa019e-a419-46af-a555-a820dd86e66b:1.0         auto-del excl 
06fa019e-a419-46af-a555-a820dd86e66b:2.0         auto-del excl 
0c72a9c9-e1bf-4515-ba4b-0d0f86e9d30a:1.0         auto-del excl 
celeryev.ed1a92fd-7ad0-4ab1-935f-6bc6a215f7d3    auto-del --limit-policy=ring --argument passive=False --argument exclusive=False --argument arguments={}
e70d72aa-7b9a-4083-a88a-f9cc3c568e5c:0.0         auto-del excl 
e7e53097-ae06-47ca-87d7-808f7042d173:1.0         auto-del excl 
resource_manager                                 --durable --argument passive=False --argument exclusive=False --argument arguments=None
resource_manager@pulp.example.com.celery.pidbox  auto-del --limit-policy=ring --argument passive=False --argument exclusive=False --argument arguments=None
resource_manager@pulp.example.com.dq             --durable auto-del --argument passive=False --argument exclusive=False --argument arguments=None


Line 8 looks like the Celery Beat service queue and lines 11, 12, and 13 are clearly associated with the resource manager. So far, so good.

Testing in Kubernetes


I had to reset the database between starts to test the Pulp Beat container.  This image doesn't change the database structure, so I don't need to reset.  I can just create a new pod definition and try it out.

Again, the differences from the Pulp Beat pod definition are pretty trivial.

So here's what it looks like when I start the pod:

kubecfg -c pods/pulp-resource-manager.json create pods
I0930 00:00:24.581712 16159 request.go:292] Waiting for completion of /operations/14
ID                      Image(s)                          Host                Labels                       Status
----------              ----------                        ----------          ----------                   ----------
pulp-resource-manager   markllama/pulp-resource-manager   /                   name=pulp-resource-manager   Waiting

kubecfg list pods
ID                      Image(s)                          Host                    Labels                       Status
----------              ----------                        ----------              ----------                   ----------
pulpdb                  markllama/mongodb                 10.245.2.2/10.245.2.2   name=db                      Running
pulpmsg                 markllama/qpid                    10.245.2.2/10.245.2.2   name=msg                     Running
pulp-beat               markllama/pulp-beat               10.245.2.4/10.245.2.4   name=pulp-beat               Terminated
pulp-resource-manager   markllama/pulp-resource-manager   10.245.2.4/10.245.2.4   name=pulp-resource-manager   Terminated

kubecfg get pods/pulp-resource-manager
ID                      Image(s)                          Host                    Labels                       Status
----------              ----------                        ----------              ----------                   ----------
pulp-resource-manager   markllama/pulp-resource-manager   10.245.2.4/10.245.2.4   name=pulp-resource-manager   Running

There are two things of note here. Line 13 shows the pulp-resource-manager pod as terminated. Remember the bug note from above.  The pod isn't terminated, it's between the pause container which downloads the image for a new container and the execution.

One line 15 I requested the information for that pod by name using the get command, rather than listing them all.  This time it shows running. as it should.

When you use get all you get by default is a one line summary.  If you want details you have to consume them as JSON and they're complete. In fact they use the same schema as the JSON used to create the pods in the first place (with a bit more detail filled in). While this could be hard for humans to swallow, it makes it AWESOME to write programs and scripts to process the output. Every command should offer some form of structured data output. Meanwhile, I wish Kubernetes would offer a --verbose option with nicely formatted plaintext.  It will come (or I'll write it if I get frustrated enough).

Get ready... Here it comes.

kubecfg --json get pods/pulp-resource-manager | python -m json.tool
{
    "apiVersion": "v1beta1",
    "creationTimestamp": "2014-09-30T00:00:24Z",
    "currentState": {
        "host": "10.245.2.4",
        "hostIP": "10.245.2.4",
        "info": {
            "net": {
                "detailInfo": {
                    "Args": null,
                    "Config": null,
                    "Created": "0001-01-01T00:00:00Z",
                    "Driver": "",
                    "HostConfig": null,
                    "HostnamePath": "",
                    "HostsPath": "",
                    "ID": "",
                    "Image": "",
                    "Name": "",
                    "NetworkSettings": null,
                    "Path": "",
                    "ResolvConfPath": "",
                    "State": {
                        "ExitCode": 0,
                        "FinishedAt": "0001-01-01T00:00:00Z",
                        "Paused": false,
                        "Pid": 0,
                        "Running": false,
                        "StartedAt": "0001-01-01T00:00:00Z"
                    },
                    "SysInitPath": "",
                    "Volumes": null,
                    "VolumesRW": null
                },
                "restartCount": 0,
                "state": {
                    "running": {}
                }
            },
            "pulp-resource-manager": {
                "detailInfo": {
                    "Args": null,
                    "Config": null,
                    "Created": "0001-01-01T00:00:00Z",
                    "Driver": "",
                    "HostConfig": null,
                    "HostnamePath": "",
                    "HostsPath": "",
                    "ID": "",
                    "Image": "",
                    "Name": "",
                    "NetworkSettings": null,
                    "Path": "",
                    "ResolvConfPath": "",
                    "State": {
                        "ExitCode": 0,
                        "FinishedAt": "0001-01-01T00:00:00Z",
                        "Paused": false,
                        "Pid": 0,
                        "Running": false,
                        "StartedAt": "0001-01-01T00:00:00Z"
                    },
                    "SysInitPath": "",
                    "Volumes": null,
                    "VolumesRW": null
                },
                "restartCount": 0,
                "state": {
                    "running": {}
                }
            }
        },
        "manifest": {
            "containers": null,
            "id": "",
            "restartPolicy": {},
            "version": "",
            "volumes": null
        },
        "podIP": "10.244.3.4",
        "status": "Running"
    },
    "desiredState": {
        "host": "10.245.2.4",
        "manifest": {
            "containers": [
                {
                    "env": [
                        {
                            "key": "PULP_SERVER_NAME",
                            "name": "PULP_SERVER_NAME",
                            "value": "pulp.example.com"
                        }
                    ],
                    "image": "markllama/pulp-resource-manager",
                    "name": "pulp-resource-manager",
                    "volumeMounts": [
                        {
                            "mountPath": "/dev/log",
                            "name": "devlog",
                            "path": "/dev/log"
                        }
                    ]
                }
            ],
            "id": "pulp-resource-manager",
            "restartPolicy": {
                "always": {}
            },
            "uuid": "c73a89c0-4834-11e4-aba7-0800279696e1",
            "version": "v1beta1",
            "volumes": [
                {
                    "name": "devlog",
                    "source": {
                        "emptyDir": null,
                        "hostDir": {
                            "path": "/dev/log"
                        }
                    }
                }
            ]
        },
        "status": "Running"
    },
    "id": "pulp-resource-manager",
    "kind": "Pod",
    "labels": {
        "name": "pulp-resource-manager"
    },
    "resourceVersion": 20,
    "selfLink": "/api/v1beta1/pods/pulp-resource-manager"
}


So there you go.

I won't repeat the QPID queue check here because if everything's going well it looks the same.

Summary


As designed there isn't really much to say.  The only real changes were to remove the DB setup and change the exec line to start the resource manager process.   That's the idea of cookie cutters.

The next one won't be as simple.  It uses the Pulp software package, but it doesn't run a Celery service.  Instead it runs an Apache daemon and a WSGI web service to offer the Pulp Admin REST protocol. It connects to the database and the messaging service.  It also needs SSL and a pair of external public TCP connections.

References


  • Docker
    Containerized Applications
  • Kubernetes
    Orchestration for Docker applications
  • Pulp
    Enterprise OS and configuration content management
  • Celery
    A distributed job management framework
  • QPID
    AMQP Message service
  • MongoDB
    NoSQL Database

Friday, September 26, 2014

Docker: Building and using a base image for Pulp services in Kubernetes

My stated goal in this series of posts is to create a working containerized Pulp service running in a Kubernetes cluster. After, what is it, 5 posts, I'm finally actually ready to do something with pulp itself.

The Pulp service proper is made up of a single Celery beat process, a single resource manager process, and some number of pulp worker processes. These together do the work of Pulp, mirroring and managing the content that is Pulp's payload. The service also requires at least one Apache HTTP server to deliver the payload but that comes later.

All of the Pulp processes are actually built on Celery.  They all require the the same set of packages and much of the same configuration information.  They all need use the MongoDB and the QPID services.  The worker processes all need access to some shared storage, but the beat and resource manager do not.

To build the Docker images for these different containers, rather than duplicating the common parts, the best practice is to put those parts into a base image and then add one last layer to create each of the variations.

In this post I'll demonstrate creating a shared base image for Pulp services and then I'll create the first image that will consume the base to create the Pulp beat service.

The real trick is to figure out what the common parts are. Some are easy though so I'll start there.

Creating a Base Image


For those of you who are coders, a base image is a little like an abstract class.  It defines some important characteristics that are meant to be re-used, but it leaves others to be resolved later.  The Docker community already provides a set of base images like the Fedora:20 image which have been hand-crafted to provide a minimal OS.  Docker makes it easy to use the same mechanism for building our own images.

The list below enumerates the things that all of the Pulp service images will share.  When I create the final images I'll add the final tweaks.  Some of these will essentially be stubs to be used later.

  • Pulp Repo file
    Pulp is not yet standard in the RHEL, CentOS or Fedora distributions
  • Pulp Server software
  • Communications Software (MongoDB and QPID client libraries)
  • Configuration tools: Augeas

There is also some configuration scripting that will be required by all the pulp service containers:

  • A script to apply the customization/configuration for the execution environment
  • A test script to ensure that the database is available before starting the celery services
  • A test script to ensure that the message service is available
Given that start, here's what I get for the Dockerfile


Lines 1 and 2 should be familiar already.  There are no new directives here but a couple of things need explaining.


  • Line 1: The base image
  • Line 2: Contact information
  • Line 4: A usage comment
    Pulp uses syslog.  For a process inside a container to write to syslog you either have to have a syslogd running or you have to have write access to the host's /dev/log file.  I'll show this gets done when
    I create a real app image from this base and run it.
  • Line 6: Create a yum repo for the Pulp package content.
    You can add files using a URL for the source.
  • Lines 9-12: Install the Pulp packages, QPID client software and Augeas to help configuration.
  • Lines 15-17: COMMENTED: Install and connect the Docker content plugin
    This is commented out at the moment.  It hasn't been packaged yet and there are some issues with dependency resolution. I left it here to remind me to put it back when the problems are resolved.
  • Line 20: Add an Augeas lens definition to manage the Pulp server.conf file
    Augeas is suitet for managing config values, when a lens exists.  More detail below
  • Line 23: Add a script to execute the configuration
    This will be used by the derived images, but it works the same for all of them
  • Line 27: Add a script which can test for access to the MongoDB
    Pulp will just blindly try to connect, but will just hang if the DB is unavailable.  This script allows me to decide to wait or quit if the database isn't ready. If I quit, Kubernetes will re-spawn a new container to try again.

The Pulp Repo


The Pulp server software is not yet in the standard Fedora or EPEL repositories. The packages are available from the contributed repositories on the Fedora project.  The repo file is also there, accessible through a URL.

The docker RUN directive can take a URL as well as a local relative file path.  

Line 4 pulls the Pulp repo file down and places it so that it can be used in the next step.

Pulp Packages (dependencies and tools)


The Pulp software is easiest installed as a YUM group.  I use a Dockerfile RUN directive to install the Pulp packages into the base image. This will install most of the packages needed for the service, but there are a couple of additional packages that aren't part of the package group.

Pulp can serve different types of repository mirrors.  These are controlled by content plugins.  I add the RPM plugin, python-pulp-rpm-common.  I also add a couple of Python QPID libraries. However you can't run both groupinstall and the normal package install command in the same invocation so the additional Python QPID libaries are installed in a second command.

I also want to install Augeas. This is a tool that enables configuration editing using a structured API or CLI command.

Augeas Lens for Pulp INI files


Augeas is an attempt to wrangle the flat file databases that make up the foundation of most *NIX application configuration. It offers a way to access individual key/value pairs within well known configuration files without resorting to tools like sed or perl and regular expressions.  With augeas each key/value pair is assigned a path and can be queried and updated using that path.  It offers both API and CLI interfaces though it's not nearly as commonly used as it should be.

The down side of Augeas is that it doesn't include a description (lens in Augeas terminology) for Pulp config files.  Pulp is too new. The upside is that the Pulp config files are fairly standard INI format, and it's easy to adapt the stock IniFile lens for Pulp.

I won't include the lens text inline here, but I put it in a gist if you want to look at it.

The ADD directive on line 20 of the Dockerfile places the lens file in the Augeas library where it will be found automatically.

Pulp Server Configuration Script


All of the containers that use this base image will need to set a few configuration values for Pulp. These reside in /etc/pulp/server.conf which is an INI formatted text file.  These settings indicate the identity of the pulp service itself and how the pulp processes communicate with the database and message bus.

If you are starting a Docker container by hand you could either pass these values in as environment variables using the
-e (--env) option or by accepting additional positional arguments through the CMD. You'd have to establish the MongoDB and QPID services then get their IP addresses from Docker and feed the values into the Pulp server containers.

Since Kubernetes is controlling the database and messaging pods and has the Service objects defined, it knows how to tell the Pulp containers where to find these services.  It sets a few environment variables for every new container that starts after the service object is created.  A new container can use these values to reach the external services it needs.

Line 23 of the Dockerfile adds a short shell script which can accept the values from the environment variables that Kubernetes provides and configure them into the Pulp configuration.

The script gathers the set of values it needs from the variables (or sets reasonable defaults) and then, using augtool (The CLI tool for Augeas) it updates the values in the pulp.conf file.

This is the snippet from the beginning of the configure_pulp_server.sh script which sets the environment variables.

# Take settings from Kubernetes service environment unless they are explicitly
# provided
PULP_SERVER_CONF=${PULP_SERVER_CONF:=/etc/pulp/server.conf}
export PULP_SERVER_CONF

PULP_SERVER_NAME=${PULP_SERVER_NAME:=pulp.example.com}
export PULP_SERVER_NAME

SERVICE_HOST=${SERVICE_HOST:=127.0.0.1}
export SERVICE_HOST

DB_SERVICE_HOST=${DB_SERVICE_HOST:=${SERVICE_HOST}}
DB_SERVICE_PORT=${DB_SERVICE_PORT:=27017}
export DB_SERVICE_HOST DB_SERVICE_PORT

MSG_SERVICE_HOST=${MSG_SERVICE_HOST:=${SERVICE_HOST}}
MSG_SERVICE_PORT=${MSG_SERVICE_PORT:=5672}
MSG_SERVICE_USER=${MSG_SERVICE_USER:=guest}
export MSG_SERVICE_HOST MSG_SERVICE_PORT MSG_SERVICE_NAME

These are the values that the rest of the script will set into /etc/pulp/server.conf

UPDATE: As of the middle of October 2014 the SERVICE_HOST variable has been removed. Now each service gets its own IP address, so the generic SERVICE_HOST variable no longer makes sense.  Each service variable must be provided explicitly when testing. Also, for testing the master host will provide a proxy to the service.  However, as of this update the mechanism isn't working yet. I'll update this post when is working properly.  If you are building from git source you can use a commit prior to 10/14/2014 and you can still use SERVICE_HOST test against the minions.

Container Startup and Remote Service Availability


When the Pulp service starts up it will attempt to connect to a MongoDB and to a QPID message broker.  If the database isn't ready, the Pulp service may just hang.

Using Kubernetes it's best not to assume that the containers will arrive in any particular order.  If the database service is unavailable, the pulp containers should just die.  Kubernetes will notice and attempt to restart them periodically.  When the database service is available the next client container will connect successfully and... not. die.

I have added a check script to the base container which can be used to test the availability (and the correct access information) for the MongoDB.  It also uses the environment variables provided by Kubernetes when the container starts.

This script merely returns a shell true (return value: 0) if the database is available and false (return value: 1) if it fails to connect.  This allows the startup script for the actual pulp service containers to check before attempting to start the pulp process and to cleanly report an error if the database is unavailable before exiting.

I haven't included a script to test the QPID connectivity.  So far I haven't seen a pulp service fail to start because the QPID service was unavailable when the client container starts.

Scripts are not executed in the base image


The scripts listed above are provided in the base image, but the the base image has no ENTRYPOINT or CMD directives.  It is not meant to be run on its own.

Each of the Pulp service images that uses this base will need to have a run script which will call these common scripts to set up the container environment before invoking the Pulp service processes. That's next.

Using a Base Image: The Pulp-Beat Component


The Pulp service is based on Celery. Celery is a framework for creating distributed task-based services. You extend the Celery framework to add the specific tasks that your application needs.

The task management is controlled by a "beat" process. Each Celery based service has to have exactly one beat server which is derived from the Celery scheduler class.

The beat server is a convenient place to do some of the service setup. Since there can only be one beat server and because it must be created first, I can use the beat service container startup to initialize the database.

The Docker development best-practices encourage image composition by layering. Creating a new layer means creating a new build space with a Dockerfile and any files that will be pulled in when the image is built.

In the case of the pulp-base image all of the content is there. The customizations for the pulp-beat service are just the run script which configures and initializes the the service before starting. The Dockerfile is trivially simple:


The real meat is in the run script, though even that is pretty anemic

The main section starts at line 44 and it's really just four steps.  Two are defined in the base image scripts and two more are defined here.

  1. Apply the configuration customizations from the environment
    These include setting the PULP_SERVER_NAME and the access parameters for the MongoDB and QPID services
  2. Verify that the MongoDB is up and accessable
    With Kubernetes you can't be dependent on ordering of the pod startups.  This check allows some time for the DB to start and become available.  Kubernetes will restart the beat pod if this fails but the checks here prevent some thrashing.
  3. Initialize the MongoDB
    This should only happen once.  Within a pulp service the beat server is a singleton.  I put the initialization step here so that it won't be confused later.
  4. Execute the master process
    This is a celery beat process customized with the Pulp master object
Even though the script line for each operation is fairly trivial I still put them into their own functions. This makes it easier for a reader to understand the logical progression and intent before going back to the function and examining the details.  It also makes it easier to comment out a single function for testing and debugging.

Testing the Beat Image (stand-alone)


Since Kubernetes currently gives so little real access debug information for the container startup process I'm going to test the Pulp beat container first as a regular Docker container.  I have my Kubernetes cluster running in Vagrant and I know the IP addresses of the MongoDB and QPID services.

The other reason to test in plain Docker is that I want to manually verify the code which picks up and uses the configuration environment variables.  There are four variables that will be required and two others that will likely default.
  • PULP_SERVER_NAME
  • SERVICE_HOST
  • DB_SERVICE_HOST
  • MSG_SERVICE_HOST
The defaulted ones will be
  • DB_SERVICE_PORT
  • MSG_SERVICE_PORT
DB_SERVICE_HOST and MSG_SERVICE_HOST can be provided directly or can pick up the value of SERVICE_HOST.  I want to test both paths.

To test this I'm going to be running the Kubernetes Vagrant cluster on Virtualbox to provide the MongoDB and QPID servers.  Then I'll run the Pulp beat server in Docker on the host. I know how to tell the beat server how to reach the services in the Kubernetes cluster (on 10.245.2.{2-4]}).

I'm going to assume that both the pulp-base and pulp-beat images are already built. I'm also going to start the container the first time using /bin/sh so I can manually start the run script and observe what it does.

docker run -d --name pulp-beat -v /dev/log:/dev/log \
>   -e PULP_SERVER_NAME=pulp.example.com \
>   -e SERVICE_HOST=10.245.2.2 markllama/pulp-beat
f16a6f2278e20e0b039cb665bc5f55de39b13a1045f00e25cdab5219652f1d80

This starts the container as a daemon and mounts /dev/log so that syslog will work. It also sets the PULP_SERVER_NAME and SERVICE_HOST variables.

docker logs pulp-beat
+ '[' '!' -x /configure_pulp_server.sh ']'
+ . /configure_pulp_server.sh
++ set -x
++ PULP_SERVER_CONF=/etc/pulp/server.conf
++ export PULP_SERVER_CONF
++ PULP_SERVER_NAME=pulp.example.com
++ export PULP_SERVER_NAME
++ SERVICE_HOST=10.245.2.2
++ export SERVICE_HOST
++ DB_SERVICE_HOST=10.245.2.2
++ DB_SERVICE_PORT=27017
++ export DB_SERVICE_HOST DB_SERVICE_PORT
++ MSG_SERVICE_HOST=10.245.2.2
++ MSG_SERVICE_PORT=5672
++ MSG_SERVICE_USER=guest
++ export MSG_SERVICE_HOST MSG_SERVICE_PORT MSG_SERVICE_NAME
++ check_config_target
++ '[' '!' -f /etc/pulp/server.conf ']'
++ configure_server_name
++ augtool -s set '/files/etc/pulp/server.conf/target[. = '\''server'\'']/server_name' pulp.example.com
Saved 1 file(s)
++ configure_database
++ augtool -s set '/files/etc/pulp/server.conf/target[. = '\''database'\'']/seeds' 10.245.2.2:27017
Saved 1 file(s)
++ configure_messaging
++ augtool -s set '/files/etc/pulp/server.conf/target[. = '\''messaging'\'']/url' tcp://10.245.2.2:5672
Saved 1 file(s)
++ augtool -s set '/files/etc/pulp/server.conf/target[. = '\''tasks'\'']/broker_url' qpid://guest@10.245.2.2:5672
Saved 1 file(s)
+ '[' '!' -x /test_db_available.py ']'
+ wait_for_database
+ DB_TEST_TRIES=12
+ DB_TEST_POLLRATE=5
+ TRY=0
+ '[' 0 -lt 12 ']'
+ /test_db_available.py
Testing connection to MongoDB on 10.245.2.2, 27017
+ '[' 0 -ge 12 ']'
+ initialize_database
+ runuser apache -s /bin/bash /bin/bash -c /usr/bin/pulp-manage-db
Loading content types.
Content types loaded.
Ensuring the admin role and user are in place.
Admin role and user are in place.
Beginning database migrations.
Applying pulp.server.db.migrations version 1
Migration to pulp.server.db.migrations version 1 complete.
...
Applying pulp_rpm.plugins.migrations version 16
Migration to pulp_rpm.plugins.migrations version 16 complete.
Database migrations complete.
+ run_celerybeat
+ exec runuser apache -s /bin/bash -c '/usr/bin/celery beat --workdir=/var/lib/pulp/celery --scheduler=pulp.server.async.scheduler.Scheduler -f /var/log/pulp/celerybeat.log -l INFO'

This shows why I set the -x at the beginning of the run script.  It causes the shell to emit each line as it is executed. You can see the environment variables as they are set.  Then they are used to configure the pulp server.conf values. The database is checked and then initialized. Finally it executes the celery beat process which replaces the shell and continues executing.

When this script runs it should have several side effects that I can check. As noted, it creates and initializes the pulp database. It also connects to the QPID server and creates several queues. I can check them in the same way I did when I created the MongoDB and QPID images in the first place.

The database has been initialized

echo show dbs | mongo 10.245.2.2
MongoDB shell version: 2.4.6
connecting to: 10.245.2.2/test
local 0.03125GB
pulp_database 0.03125GB
bye

And the celery beat service has added a few queues to the QPID service

qpid-config queues -b guest@10.245.2.4
Queue Name                                     Attributes
======================================================================
0b78268e-256f-4832-bbcc-50c7777a8908:1.0       auto-del excl 
411cc98f-eed3-45f9-b455-8d2e5d333262:0.0       auto-del excl 
aaf61614-919e-49ea-843f-d83420e9232f:1.0       auto-del excl 
celeryev.de500902-4c88-4d5c-90f4-1b4db366613d  auto-del --limit-policy=ring --argument passive=False --argument exclusive=False --argument arguments={}

But what if I do it wrong?

You can see that the output from a correct startup is pretty lengthy.  When I'm happy that the image is stable I'll remove the shell -x setting (and make it either an argument or environment switch for later).  There are several other paths to test.


  1. Fail to provide Environment Variables
    1. PULP_SERVER_NAME
    2. SERVICE_HOST
    3. DB_SERVICE_HOST
    4. MSG_SERVICE_HOST
  2. Fail to import /dev/log volume
Each of these will have slightly different failure modes.  I suggest you try each of them and observe how it fails.  Think of others, I'm sure I've missed some.

For the purposes of this post I'm going to treat these as exercises for the reader and move on.

Testing the Beat Image (Kubernetes)


Now things get interesting.  I have to craft a Kubernetes pod description that creates the pulp-beat container, gives it access to logging and connects it to the database and messaging services.

Defining the Pulp Beat pod


Because of the way I crafted the base image and run scripts, this isn't actually as difficult or as complicated as you might think.  It turns out that the only environment variable I have to actually pass in is the PULP_SERVER_NAME.  The rest of the environment values are going to be provided by the kubelet as defined by the Kubernetes service objects (and served by the MongoDB and QPID containers behind them).



The only really significant thing here is the volume imports.

Pulp uses the python logging mechanism and that in turn by default requires the syslog service. On Fedora 20, syslog is no longer a separate process.  It's been absorbed into the systemd suite of low level services and is known now as journald.  (cat flamewars/systemd/{pro,con} >/dev/null).

For me this means that for Pulp to run properly it needs the ability to write syslog messages.  In Fedora 20 this amounts to being able to write to a special file /dev/log.  This file isn't available in containers without some special magic.  For Docker that magic is -v /dev/log:/dev/log. This imports the host's /dev/log into the container at the same location.  For Kubernetes this is a little bit more involved.

The Kubernetes pod construct has some interesting side-effects.  The purpose of pods is to allow the creation of sets of containers that share resources.  The JSON reflects this in how the shared resources are declared.

In the pod spec, lines 14-20 are inside the container hash for the container named pulp-beat.  They indicate that a volume named "devlog" (line 15) will be mounted read/write (line 16) on /dev/log inside the container (line 17).

Note that this section does not define the named volume or indicate where it will come from. That's defined at the pod level not the container.

Now look at lines 20-23. these are at the pod level (the list of containers has been closed on line 19). The volumes array contains a set of volume definitions. I only define one, named "devlog" (line 21) and indicate that it comes from the host and that the source path is /dev/log.

All that to replace the docker argument -v /dev/log:/dev/log.

Right now this seems like a lot of work for a trivial action. Later this distinction will become very important.  The final pod for Pulp will be made up of at least two containers.  The pod will import two different storage locations from the host and both containers will mount them.

One last time for clarity: the volumes list is at the pod level.  It defines a set of external resources that will be made available to the containers in the pod.  The volumeMounts list is at the container level.  It maps entries from the volumes section in the pod to mount points inside the container using the value of the name as the connecting handle.


Starting the Pulp Beat Pod


Starting the pulp beat pod is just like starting the MongoDB and QPID pods was. At this point it does require that the Service objects have been created and that the service containers are running, so if you're following along and haven't done that, go do it. Since I'd run my pulp beat container manually and it had modified the mongodb, I also removed the pulp_database before proceeding.

echo 'db.dropDatabase()' | mongo 10.245.2.2/pulp_database
MongoDB shell version: 2.4.6
connecting to: 10.245.2.2/pulp_database
{ "dropped" : "pulp_database", "ok" : 1 }
bye
echo show dbs | mongo 10.245.2.2
MongoDB shell version: 2.4.6
connecting to: 10.245.2.2/test
local 0.03125GB
bye

To start the pulp beat pod we go back to kubecfg (remember, I aliased kubecfg=~/kubernetes/cluster/kubecfg.sh).

kubecfg -c pods/pulp-beat.json create pods
ID                  Image(s)              Host                Labels              Status
----------          ----------            ----------          ----------          ----------
pulp-beat           markllama/pulp-beat   /                   name=pulp-beat      Waiting

kubecfg get pods/pulp-beat
ID                  Image(s)              Host                    Labels              Status
----------          ----------            ----------              ----------          ----------
pulp-beat           markllama/pulp-beat   10.245.2.2/10.245.2.2   name=pulp-beat      Waiting

Now I know that the pod has been assigned to 10.245.2.2 (minion-1) I can log in there directly and examine the docker container.

vagrant ssh minion-1
Last login: Fri Dec 20 18:02:34 2013 from 10.0.2.2
sudo docker ps | grep pulp-beat
2515129f2c7e        markllama/pulp-beat:latest   "/run.sh"              54 seconds ago      Up 53 seconds                                k8s--pulp_-_beat.a6ba93e9--pulp_-_beat.etcd--d2a60369_-_458d_-_11e4_-_b682_-_0800279696e1--0b799f3d   
sudo docker logs 2515129f2c7e
+ '[' '!' -x /configure_pulp_server.sh ']'
+ . /configure_pulp_server.sh
++ set -x
++ PULP_SERVER_CONF=/etc/pulp/server.conf
++ export PULP_SERVER_CONF
++ PULP_SERVER_NAME=pulp.example.com
++ export PULP_SERVER_NAME
++ SERVICE_HOST=10.245.2.2
++ export SERVICE_HOST
++ DB_SERVICE_HOST=10.245.2.2
++ DB_SERVICE_PORT=27017
++ export DB_SERVICE_HOST DB_SERVICE_PORT
++ MSG_SERVICE_HOST=10.245.2.2
++ MSG_SERVICE_PORT=5672
++ MSG_SERVICE_USER=guest
++ export MSG_SERVICE_HOST MSG_SERVICE_PORT MSG_SERVICE_NAME
++ check_config_target
++ '[' '!' -f /etc/pulp/server.conf ']'
++ configure_server_name
++ augtool -s set '/files/etc/pulp/server.conf/target[. = '\''server'\'']/server_name' pulp.example.com
Saved 1 file(s)
++ configure_database
++ augtool -s set '/files/etc/pulp/server.conf/target[. = '\''database'\'']/seeds' 10.245.2.2:27017
Saved 1 file(s)
++ configure_messaging
++ augtool -s set '/files/etc/pulp/server.conf/target[. = '\''messaging'\'']/url' tcp://10.245.2.2:5672
Saved 1 file(s)
++ augtool -s set '/files/etc/pulp/server.conf/target[. = '\''tasks'\'']/broker_url' qpid://guest@10.245.2.2:5672
Saved 1 file(s)
+ '[' '!' -x /test_db_available.py ']'
+ wait_for_database
+ DB_TEST_TRIES=12
+ DB_TEST_POLLRATE=5
+ TRY=0
+ '[' 0 -lt 12 ']'
+ /test_db_available.py
Testing connection to MongoDB on 10.245.2.2, 27017
+ '[' 0 -ge 12 ']'
+ initialize_database
+ runuser apache -s /bin/bash /bin/bash -c /usr/bin/pulp-manage-db
Loading content types.
Content types loaded.
Ensuring the admin role and user are in place.
Admin role and user are in place.
Beginning database migrations.
Applying pulp.server.db.migrations version 1
Migration to pulp.server.db.migrations version 1 complete.
...
Applying pulp_rpm.plugins.migrations version 16
Migration to pulp_rpm.plugins.migrations version 16 complete.
Database migrations complete.
+ run_celerybeat
+ exec runuser apache -s /bin/bash -c '/usr/bin/celery beat --workdir=/var/lib/pulp/celery --scheduler=pulp.server.async.scheduler.Scheduler -f /var/log/pulp/celerybeat.log -l INFO'

If this is the first time running the image it may take a while for Kubernetes/Docker to pull it from the Docker hub. There may be a delay as the kubernetes pause container does the pull.

I can now run the same tests I did earlier on the MongoDB and QPID services to reassure myself that the pulp beat service is connected.

echo show dbs | mongo 10.245.2.2
MongoDB shell version: 2.4.6
connecting to: 10.245.2.2/test
local 0.03125GB
pulp_database 0.03125GB
bye

qpid-config queues -b guest@10.245.2.4
Queue Name                                     Attributes
======================================================================
613f4b89-e63e-4230-9620-e932f5a777e5:0.0       auto-del excl 
c990ea7b-3d7f-4603-80e5-176ebc649ff1:1.0       auto-del excl 
celeryev.ffbc537b-1161-4049-b425-723487135fc2  auto-del --limit-policy=ring --argument passive=False --argument exclusive=False --argument arguments={}
e0155372-12ee-4c9a-9c4d-8f4863601b3a:1.0       auto-del excl 

After all that thought and planning the end result is actually kinda boring.  Just the way I like it.

What's next?


The pulp-beat service is just the first real pulp component.  It runs in isolation from the other components, communicating only through the messaging and database.  There is another component like that, the pulp-resource-manager.  This is another Celery process and the it is created, started and tested just like the pulp-beat service.  I'm going to do one much-shorter post on that for completeness before tackling the next level of complexity.

The two remaining different components are the content pods, which require shared storage and which will have two cooperating containers running inside the pod.  One will manage the content mirroring and the other will serve the content out to clients.

I think before that though I will tackle the Pulp Admin service.  This is a public facing REST service which accepts pulp admin commands to create and manage the content repositories.

Both of these will require the establishment of encryption, which means placing x509 certificates within the containers.  These are the upcoming challenges.


References

  • Docker - Containerized applications
  • Kubernetes - Orchestration for creating containerized services
  • MongoDB - A Non-relational database
  • QPID - an AMQP messaging service
  • Pulp - An enterprise OS content mirroring system
  • Celery - A Distributed Task Queue Framework
  • Augeas - Structured queries and updates to (largely) unstructured configurations
  • INI Files - A simple format for simple configurations

Tuesday, September 23, 2014

Kubernetes Under The Hood: Etcd

Kubernetes is an effort which originated within Google to provide an orchestration layer above Docker containers.  Docker operation is limited to actions on a single host.  Kubernetes attempts to provide a mechanism to manage large sets of containers on a cluster of container hosts.  Above that will eventually be job management services like Mesos or Aurora.

Anatomy of  Kubernetes Cluster


A Kubernetes cluster is made up of three major active components

  1. Kubernetes app-service
  2. Kubernetes kubelet agent
  3. etcd distributed key/value database

The app-service is the front end of the Kubernetes cluster.  It accepts requests from clients to create and manage containers, services and replication controllers within the cluster. This is the control interface of Kubernetes.

The kubelet is the active agent.  It resides on a Kubernetes cluster member host.  It polls for instructions or state changes and acts to execute them on the host.

The etcd services are the communications bus for the Kubernetes cluster.  The app-service posts cluster state changes to the etcd database in response to commands and queries.  The kubelets read the contents of the etcd database and act on any changes they detect.

There's also a kube-proxy process which does the Service network proxy work but that's not relevant to the larger operations.

This post is going to describe and play with the etcd.

OK, so what is Etcd?


Etcd (or etcd) is a service created by the CoreOS team to create a shared distributed configuration database.  It's a replicated key/value store.  The data are accessed using ordinary HTTP(S) GET and PUT queries.  The status, metadata and payload are returned as members of a JSON data structure.

Etcd has a companion CLI client for testing and manual interaction.  This is called etcdctl.  Etcdctl is merely a wrapper that hides the HTTP interactions and the raw JSON that is used as status and payload.

Installing and Running Etcd


Etcd (and etcdctl, the CLI client) aren't yet available in RPM format from the standard repositories, or if they are they're very old. If you're running on 64 bit Linux you can pull the most recent binaries from the Github repository for CoreOS. Download them, unpack the tar.gz file and place the binaries in your path.


curl -s -L https://github.com/coreos/etcd/releases/download/v0.4.6/etcd-v0.4.6-linux-amd64.tar.gz | tar -xzvf -
etcd-v0.4.6-linux-amd64/
etcd-v0.4.6-linux-amd64/etcd
etcd-v0.4.6-linux-amd64/etcdctl
etcd-v0.4.6-linux-amd64/README-etcd.md
etcd-v0.4.6-linux-amd64/README-etcdctl.md
cd etcd-v0.4.6-linux-amd64


Once you have the binaries, check out the Etcd and Etcdctl github pages for basic usage instructions.  I'll duplicate here a little bit just to get moving.

Etcd doesn't run as a traditional daemon.  It remains connected to STDOUT and logs activity.  I'm not going to demonstrate here how to turn it into a proper daemon.  Instead I'll run it in one terminal session and use another to access it.

NOTE 1: Etcd does not use standard longopts conventions.  All of the options use single leading hyphens.
NOTE 2: Etcdctl does follow the longopt conventions.  Go figure.

./etcd
[etcd] Sep 23 10:36:04.655 WARNING   | Using the directory myhost.etcd as the etcd curation directory because a directory was not specified. 
[etcd] Sep 23 10:36:04.656 INFO      | myhost is starting a new cluster
[etcd] Sep 23 10:36:04.658 INFO      | etcd server [name myhost, listen on :4001, advertised url http://127.0.0.1:4001]
[etcd] Sep 23 10:36:04.658 INFO      | peer server [name myhost, listen on :7001, advertised url http://127.0.0.1:7001]
[etcd] Sep 23 10:36:04.658 INFO      | myhost starting in peer mode
[etcd] Sep 23 10:36:04.658 INFO      | myhost: state changed from 'initialized' to 'follower'.
[etcd] Sep 23 10:36:04.658 INFO      | myhost: state changed from 'follower' to 'leader'.
[etcd] Sep 23 10:36:04.658 INFO      | myhost: leader changed from '' to 'myhost'.

As you can see the daemon listens by default to the localhost interface on port 4001/TCP for client interactions and on port 7001/TCP for clustering communications.  See the output of etcd -help for detailed options.  You can also see the process whereby the new daemon attempts to connect to peers and determine its place within the cluster.  Since there are no peers, this one elects itself leader.

That output looks as if the etcd is running. I can check by querying the daemon version and some other information.

curl -s http://127.0.0.1:4001/version
etcd 0.4.6

I can also get some stats from the daemon directly as well:

curl -s -L http://127.0.0.1:4001/v2/stats/self | python -m json.tool
{
    "leaderInfo": {
        "leader": "myhost",
        "startTime": "2014-09-23T10:37:04.839453766-04:00",
        "uptime": "5h10m13.053046076s"
    },
    "name": "myhost",
    "recvAppendRequestCnt": 0,
    "sendAppendRequestCnt": 0,
    "startTime": "2014-09-23T10:37:04.83945236-04:00",
    "state": ""
}

So now I know it's up and responding.

Playing with Etcd


Etcd responds to HTTP(S) queries both to set and retrieve data.  All of the data are organized into a hierarchical key set (which for normal people means that the keys look like files in a tree of directories).  The values are arbitrary strings. This makes it very easy to test and play with etcd using ordinary CLI web query tools like curland wget. The binary releases also include a CLI client called etcdctl which simplifies the interaction, allowing the caller to focus on the logical operation and the result rather than the HTTP/JSON interaction. I'll show both methods where they are instructive, choosing the best one for each example.

The examples here are adapted from the CoreOS examples on Github.  There's also a complete protocol document for it as well

Once the etcd is running I can begin working with it.

Etcd is a hierarchical key=value store. This means that each piece of stored data has a key which uniquely identifies it within the database. The key is hierarchical in that the key is composed of a set of elements that form a path from a fixed known starting point for the database known as the root. Any given element in the database can either be a branch (directory) or a leaf (value).  Directories contain other keys and are used to create the hierarchy of data.

This is all formal gobbledy-gook for "it looks just like a filesystem". In fact a number of the operations that etcdctl offers are exact analogs of filesystem commands: mkdir, rmdir, ls, rm.

The first operation is to look at the contents of the root of the database. Expect this to be boring because there's nothing there yet.


./etcdctl ls /


See? There's nothing there. Boring.

It looks a little different when you pull it using curl.

curl -s http://127.0.0.1:4001/v2/keys/ | python -m json.tool
{
    "action": "get",
    "node": {
        "dir": true,
        "key": "/"
    }
}

The return payload is JSON. I use the python json.tool module to pretty print it.

I can see that this is the response to a GET request. The node hash describes the query and result. I asked for the root key (/) and it's an (empty) directory.

Life will be a little more interesting if there's some data in the database. I'll add a value and I'm going to put it well down in the hierarchy to show how the tree structure works.

./etcdctl set /foo/bar/gronk "I see you"
I see you

Now when I ask etcdctl for the contents of the root directory I at least get some output:

./etcdctl ls /
/foo

But that's much more interesting when I look using curl.

curl -s http://127.0.0.1:4001/v2/keys/ | python -m json.tool
{
    "action": "get",
    "node": {
        "dir": true,
        "key": "/",
        "nodes": [
            {
                "createdIndex": 7,
                "dir": true,
                "key": "/foo",
                "modifiedIndex": 7
            }
        ]
    }
}

This looks very similar to the previous response with the addition of the nodes array.  I can infer that this list contains the set of directories and values that the root contains.  In this case it contains one other subdirectory named /foo.
Creating a new value is also more fun using curl:

curl -s http://127.0.0.1:4001/v2/keys/fiddle/faddle -XPUT -d value="popcorn" | python -m json.tool
{
    "action": "set",
    "node": {
        "createdIndex": 8,
        "key": "/fiddle/faddle",
        "modifiedIndex": 8,
        "value": "popcorn"
    }
}

The return payload is the REST acknowledgement response to the PUT query. It looks similar to the GET query response, but not identical. The action is (appropriately enough) set. Only a single node is returned, not the node list you get when querying a directory and the value is provided as well.  The REST protocol (and the etcdctl command) allow for a number of modifiers for queries. Two I'm going to use a lot are sort and recursive.

If I want to see the complete set of nodes underneath a directory I can use etcdctl ls with the --recursive option:


./etcdctl ls / --recursive
/foo
/foo/bar
/foo/bar/gronk
/fiddle
/fiddle/faddle

That's a nice pretty listing. As you can imagine, this gets a bit messier if you use curl for the query. This is probably the last time I'll use curl for a query here.  

curl -s http://127.0.0.1:4001/v2/keys/?recursive=true| python -m json.tool
{
    "action": "get",
    "node": {
        "dir": true,
        "key": "/",
        "nodes": [
            {
                "createdIndex": 7,
                "dir": true,
                "key": "/foo",
                "modifiedIndex": 7,
                "nodes": [
                    {
                        "createdIndex": 7,
                        "dir": true,
                        "key": "/foo/bar",
                        "modifiedIndex": 7,
                        "nodes": [
                            {
                                "createdIndex": 7,
                                "key": "/foo/bar/gronk",
                                "modifiedIndex": 7,
                                "value": "I see you"
                            }
                        ]
                    }
                ]
            },
            {
                "createdIndex": 8,
                "dir": true,
                "key": "/fiddle",
                "modifiedIndex": 8,
                "nodes": [
                    {
                        "createdIndex": 8,
                        "key": "/fiddle/faddle",
                        "modifiedIndex": 8,
                        "value": "popcorn"
                    }
                ]
            }
        ]
    }
}

Clustering Etcd


Etcd is designed to allow database replication and the formation of clusters.  When two etcds connect, they use a different port from the normal client access port.  An etcd that intends to participate listens on that second port and also connects to a list of peer processes which also are listening.

You can set up peering (replication) using the command line arguments --peer-addr and --peers or you can set the values in the configuration file /etc/etcd/etcd.conf

Complete clustering documentation can be found on Github.

Etcd and Security


Etcd communications can be encrypted using SSL, but there is no authentication or access control. This makes it simple to use, but it makes it critical that you be careful never to place sensitive information like passwords or private keys into Etcd. It also means that you assume when using etcd that there are no malicious actors in the network space which has access.  Any process with network access can both read and write any keys and values within the etcd.  It is absolutely essential that access to etcd be protected at the network level because there's nothing else restricting access.

Instructions for enabling SSL to encrypt etcd traffic is also on Github

Etcd can be configured to restrict access to queries which use a client certificate but this provides very limited access control.  Clients are either allowed full access or denied.  There is no concept of a user, or authentication or access control policy once a connection has been allowed.

Additional Capabilities of Etcd


Don't make the mistake of thinking that Etcd is a simple networked filesystem with an HTTP/REST protocol interface. Etcd has a number of other important capabilities related to its role in configuration and cluster management.

Each directory or leaf node can have a Time To Live or TTL value associated with it.  The TTL indicates the lifespan of they key/value pair in seconds.  When a value is set, if the TTL is also set then that key/value pair will expire when the TTL drops to zero.  After that the value will no longer be available.

It is also possible to create hidden nodes. These are nodes that will not appear in directory listings.  To access them the query must specify the correct path explicitly.  Any node name which begins with an underscore character (_) will be hidden from directory queries.

Most importantly it is possible for clients to wait for changes to a key.  If I issue a GET query on a key with the wait flag set then the query will block, leaving the query incomplete and the TCP session open. Assuming that the client doesn't time out the query will remain open and unresolved until the etcd detects (and executes) a change request on that key.  At that point the waiting query will also complete and return the new value.  This can be used as an event management or messaging system to avoid unnecessary polling.

Etcd in Kubernetes


Etcd is used by Kubernetes as both the cluster state database and as the communications mechanism between the app-server and the kubelet processes on the minion hosts.  The app-server places values into the etcd in response to requests from the users for things like new pods or services, and it queries values from it to get status on the minions, pods and services.

The kubelet processes also both query and update the contents of the database.  They poll for desired state changes and create new pods and services in response.  They also push status information back to the etcd to make it available to client queries.

The root of the Kubernetes data tree within the etcd database is /registry. Let's see what's there.



./etcdctl ls /registry --recursive
/registry/services
/registry/services/specs
/registry/services/specs/db
/registry/services/specs/msg
/registry/services/endpoints
/registry/services/endpoints/db
/registry/services/endpoints/msg
/registry/pods
/registry/pods/pulpdb
/registry/pods/pulpmsg
/registry/pods/pulp-beat
/registry/pods/pulp-resource-manager
/registry/hosts
/registry/hosts/10.245.2.3
/registry/hosts/10.245.2.3/kubelet
/registry/hosts/10.245.2.4
/registry/hosts/10.245.2.4/kubelet
/registry/hosts/10.245.2.2
/registry/hosts/10.245.2.2/kubelet

I'm running the Vagrant cluster on Virtualbox with three minions.  These are listed under the hosts subtree. 

I've also defined two services, db and msg which are found under the services subtree.  The service data is divided into two parts.  The specs tree contains the definitions I provided for the two services.  The endpoints subtree contains records which indicate the actual locations of the containers labeled to accept the service connections.

Finally I've defined four pods which make up the service I'm building (which happens to be a Pulp service). Each host is listed by its IP address at the moment. Work is on-going to allow the minions to be referred to by their host-name but that requires control of the nameservice which is available inside the containers. Without a universal nameservice for containers, IP addresses are the only way for processes inside a container to find hosts outside.

Some of the values here will look familiar to someone who has created pods and services using the kubecfg client.  They are nearly identical to the JSON query and response payloads from the Kubernetes app-server.

I don't recommend making any changes or additions to the etcd database in a running Kubernetes cluster. I haven't looked deeply enough yet into how the app-server and kubelet interact with etcd and it would be very easy I think to upset them.  For now I'm able to query etcd and confirm that my commands have or have not been initiated and compare what I see to what I expect.


Summary


Etcd is a neat tool for storing and sharing configuration data.  It's only useful (so far) in limited cases where there are no malicious or careless users, but it's a very young project.  I am speculating that etcd is a a temporary component of Kubernetes.  It provides the features needed to facilitate the development of the app-server and kubelet which are the core functions of Kubernetes.  Once those are stable, if others feel the need to use a more secure or scalable component then it can be done.  The configuration payload can remain and only the communications mechanism will need to be replaced.


References



Thursday, September 4, 2014

Kubernetes: Simple Containers and Services

From previous posts I now have a MongoDB image and another which runs a QPID AMQP broker.  I intend for these to be used by the Pulp service components.

What I'm going to do this time is to create the subsidiary services that I'll need for the Pulp service within a Kubernetes cluster.

UPDATE 12/16/2014: recently the kubecfg command has been deprecated and replaced with kubectl. I've updated this post to reflect the CLI call and output from kubectl.

Pre-Launch


A Pulp service stores it's persistent data in the database.  The service components, a Celery Beat server and a number of Celery workers, as well as one or more Apache web server daemons all communicate using the AMQP message broker.  They store and retrieve data from the database.

In a traditional bare-bare metal or VM based installation all of these services would likely be run on the same host.  If they are distributed, then the IP addresses and credentials of the support services would have to be configured into Pulp servers manually or using some form of configuration management. Using containers the components can be isolated but the task of tracking them and configuring the consumer processes remains.

Using just Docker, the first impulse of an implementer would be similar, to place all of the containers on the same host.  This would simplify the management of the connectivity between the parts, but it also defeats some of the benefit of containerized applications: portability and non-locality. This isn't a failing of Docker. It is the result of conscious decisions to limit the scope of what Docker attempts to do, avoiding feature creep and bloat.  And this is where a tool like Kubernetes comes in.

As mentioned elsewhere, Kubernetes is a service which is designed to bind together a cluster of container hosts, which can be regular hosts running the etcd and kubelet daemons or they can be specialized images like Atomic or CoreOS.  They can be private or public services such as Google Cloud

For Pulp, I need to place a MongoDB and a QPID container within a Kubernetes cluster and create the infrastructure so that clients can find it and connect to it.  For each of these I need to create a Kubernetes Service and a Pod (group of related containers).

Kicking the Tires


It's probably a good thing to explore a little bit before diving in so that I can see what to expect from Kubernetes in general.  I also need to verify that I have a working environment before I start trying to bang on it.

Preparation


If you're following along, at this point I'm going to assume that you have access to a running Kubernetes cluster.  I'm going to be using the Vagrant test cluster as defined in the github repository and described in the Vagrant version of the Getting Started Guides.

I'm also going to assume that you've built the kubernetes binaries.  I'm using the shell wrappers in the cluster sub-directory, especially cluster/kubectl.sh.   If you try that and you haven't built the binaries you'll get a message that looks like this:

cluster/kubectl.sh 
It looks as if you don't have a compiled kubectl binary.

If you are running from a clone of the git repo, please run
'./build/run.sh hack/build-cross.sh'. Note that this requires having
Docker installed.

If you are running from a binary release tarball, something is wrong. 
Look at http://kubernetes.io/ for information on how to contact the 
development team for help.

If you see that, do as it says. If that fails, you probably haven't installed the golang package.



For convenience I alias the kubectl.sh wrapper so that I don't need the full path.

alias kubectl=~/kubernetes/cluster/kubectl.sh

Like most CLI commands now if you invoke it with no arguments it prints usage.

kubectl --help 2>1 | more
Usage of kubectl:

Usage: 
  kubectl [flags]
  kubectl [command]

Available Commands: 
  version                                             Print version of client and server
  proxy                                               Run a proxy to the Kubernetes API server
  get [(-o|--output=)json|yaml|...] <resource> [<id>] Display one or many resources
  describe <resource> <id>                            Show details of a specific resource
  create -f filename                                  Create a resource by filename or stdin
  createall [-d directory] [-f filename]              Create all resources specified in a directory, filename or stdin
  update -f filename                                  Update a resource by filename or stdin
  delete ([-f filename] | (<resource> <id>))          Delete a resource by filename, stdin or resource and id

The full usage output can be found in the CLI documentation in the Kubernetes Github repository.

kubectl has one oddity that makes a lot of sense once you understand why it's there. The command is meant to produce output which is consumable by machines using UNIX pipes. The output is structured data formatted using JSON or YAML. To avoid strange errors in the parsers, the only output to STDOUT is structured data. This means that all of the human readable output goes to STDERR. This isn't just the error output though. This includes the help output. So if you want to run the help and usage output through a pager app like more(1) or less(1), you have to first redirect STDERR to STDOUT as I did above.

Exploring the CLI control objects


You can see in the REST API line the possible operations: get, list, create, delete, update . That line also shows the objects that the API can manage: minions, pods, replicationControllers, servers.

Minions


A minion is a host that can accept containers.  It runs an etcd and a kubelet daemon in addition to the Docker daemon.For our purposes a minion is where containers can go.

I can list the minions in my cluster like this:

kubectl get minions
NAME                LABELS
10.245.2.4          <none>
10.245.2.2          <none>
10.245.2.3          <none>

The only valid operation on minions using the REST protocol are the list and get actions.  The get response isn't very interesting.

Until I add some of the other objects this is the most interesting query.  It indicates that there are three minions connected and ready to accept containers.

Pods


A pod is the Kubernetes object which describes a set of one or more containers to be run on the same minion.  While the point of a cluster is to allow containers to run anywhere within the cluster, there are times when a set of containers must run together on the same host. Perhaps they share some external filesystem or some other resource.  See the golang specification for the Pod struct.

kubectl get pods
NAME                IMAGE(S)            HOST                    LABELS              STATUS

See? not very interesting.

Replication Controllers


I'm going to defer talking about replication controllers in detail for now.  It's enough to note their existence and purpose.

Replication controllers are the tool to create HA or load balancing systems. Using a replication controller you can tell Kubernetes to create multiple running containers for a given image.  Kubernetes will ensure that if one container fails or stops that a new container will be spawned to replace it.

I can list the replication controllers in the same way as minions or pods, but there's nothing to see yet.

Services


I think the term service is an unfortunate but probably unavoidable terminology overload.

In Kubernetes, a service defines a TCP or UDP port reservation.  It provides a way for applications running in containers to connect to each other without requiring that each one be configured with the end-point IP addresses. This both allows for abstracted configuration and for mobility and load balancing of the providing containers.

When I define a Kubernetes service, the service providers (the MongoDB and QPID containers) will be labeled to receive traffic and the service consumers (the Pulp components) will be given the access information in the environment so that they can reach the providers. More about that later.

I can list the services in the same way as I would minions or pods. And it turns out that creating a couple of Kubernetes services is the first step I need to take to prepare the Pulp support service containers.

Creating a Kubernetes Service Object


In a cloud cluster one of the most important considerations is being able to find things.  The whole point of the cloud is to promote non-locality.  I don't care where things are, but I still have to be able to find them somehow.

A Kubernetes Service object is a handle that allows my MongoDB and QPID clients find the servers without them having to know where they really are. It defines a port to listen on and a way for clients to indicate that they want to accept the traffic that comes in. Kubernetes arranges for the traffic to be forwarded to the servers.

Kubernetes both accepts and produces structured data formats for input and reporting.  The two currently supported formats are JSON and YAML.  The Service structure is relatively simple but it has elements which are shared by all of the top level data structures. Kubernetes doesn't yet have any tooling to make the creation of an object description easier than hand-crafting a snipped of JSON or YAML.  Each of the structures is documented in the godoc for Kubernetes. For now that's all you get.

There are a couple of provided examples and these will have to do for now. The guestbook example demonstrates using ReplicationServers and master/slave implementation using Redis.  The second shows how to perform a live update of the pods which make up an active service within a Kubernetes cluster. These are actually a bit more advanced than I'm ready for and don't give the detailed break-down of the moving parts that I mean to do.

This is a complete description of the service. Lines 5-8 define the actual content.
  • Line 2 indicates that this is a Service object.
  • Line 3 indicates the object schema version.
    v1beta1 is current
    (note: my use of the term 'schema' is a loose one)
  • Line 4 identifies the Service object.
    This must be unique within the set of services
  • Line 5 is the TCP port number that will be listening
  • Line 6 is for testing.  It tells the proxy on the minion with that IP to listen for inbound connections.
    I'll also use the publicIPs value to expose the HTTP and HTTPS services for Pulp
  • Lines 7-9 set the Selector
    The selector is used to associate this Service object with containers that will accept the inbound traffic.
    This will match with one of the label items assigned to the containers.

When a new service is created Kubernetes establishes a listener on an available IP address (one of the minions addresses).  While the service object exists any new containers will start with a new set of environment variables which provide access information.  The value of the selector (converted to upper case) is used as the prefix for these environment variables so that containers can be designed to pick them up and use them for configuration.

For now I just need to establish the service so that when I create the DB and QPID containers they have something to be bound to.

The QPID service is identical to the MongoDB service, replacing the port (5672) and the selector (msg)

Querying a Service Object


I've just created a Service object. I wonder what Kubernetes thinks of it? I can list the services as seen above. I can also get the object information using kubectl.

kubectl get services db
NAME                LABELS              SELECTOR            IP                  PORT
db                                name=db             10.0.41.48          27017


That's nice. I know the important information now.  But what does it look like really.


kubectl get --output=json services db
{
    "kind": "Service",
    "id": "db",
    "uid": "c040da3d-8536-11e4-a18b-0800279696e1",
    "creationTimestamp": "2014-12-16T15:18:12Z",
    "selfLink": "/api/v1beta1/services/db?namespace=default",
    "resourceVersion": 13,
    "apiVersion": "v1beta1",
    "namespace": "default",
    "port": 27017,
    "protocol": "TCP",
    "selector": {
        "name": "db"
    },
    "publicIPs": [
        "10.245.2.2"
    ],
    "containerPort": 0,
    "portalIP": "10.0.41.48"
}


Clearly Kubernetes has filled out some of the object fields.  Note the --output=json flag for structured data.

I'll be using this method to query information about the other elements as I go along.

Describing a Container (Pod) in Kubernetes


We've seen how to run a container on a Docker host.  With Kubernetes we have to create and submit a description of the container with all of the required variables defined.

Kubernetes has an additional abstraction called a pod.  While Kubernetes is designed to allow the operator to ignore the location of containers within the cluster, there are times when a set of containers needs to be co-located on the same host.  A pod is Kubernetes' way of grouping containers when needed.  When starting a single container it will still be referred to as a member of a pod.


Here's the description of a pod containing the MongoDB service image I created earlier.




This is actually a set of nested structures, maps and arrays.


  • Lines 1-21 define a Pod.
  • Lines 2-4 are elements of an inline JSONBase structure
  • Lines 5-7 are a map (hash) of strings assigned to the Pod struct element named Labels.
  • Lines 8-20 define a PodState named DesiredState.
    The only required element is the ContainerManifest, named Manifest in the PodState.
  • A Podstate has a required Version and ID, though it is not a subclass of JSONBase.
      It also has a list of Containers and an optional list of Volumes
  • Lines 12-18 define the set of containers (only one in this case) that will reside in the pod.
    A Container has a name and an image path (in this case to the previously defined mongodb image).
  • Lines 15-17 are a set of Port specifications.
      These indicate that something inside the container will be listening on these ports.


You can see how learning the total schema means fishing through each of these structure definitions in the documentation.  If you work at it you will get to know them.  To be fair they are really meant to be generated and consumed by machines rather than humans.  Kubernetes is still the business end of the service. Pretty dashboards will be provided later.  The only visibility I really need is for development and diagnostics. There are gaps here too, but finding them is what experiments like this are about.

A note on Names and IDs


There are several places where there is a key named "name" or "id". I could give them all the same value, but I'm going to deliberately vary them so I can expose which ones are used for what purpose. Names can be arbitrary strings. I believe that IDs are restricted somewhat (no hyphens).

Creating the first Pod


Now I can get back to business.

Once I have the Pod definition expressed in JSON I can submit that to kubectl for processing.


kubectl create -f pods/mongodb.json 
pulpdb


TADA! I now have a MongoDB running in Kubernetes.

But how do I know?


Now that I actually have a pod, I should be able to query the Kubernetes service about it and get more than an empty answer.

kubectl get pods pulpdb
NAME                IMAGE(S)            HOST                    LABELS              STATUS
pulpdb              markllama/mongodb   10.245.2.3/10.245.2.3   name=db             Running


Familiar and Boring. But I can get more from kubectl by asking for the raw JSON return from the query.

{
    "kind": "Pod",
    "id": "pulpdb",
    "uid": "4bac8381-8537-11e4-a18b-0800279696e1",
    "creationTimestamp": "2014-12-16T15:22:06Z",
    "selfLink": "/api/v1beta1/pods/pulpdb?namespace=default",
    "resourceVersion": 22,
    "apiVersion": "v1beta1",
    "namespace": "default",
    "labels": {
        "name": "db"
    },
    "desiredState": {
        "manifest": {
            "version": "v1beta2",
            "id": "",
            "volumes": [
                {
                    "name": "devlog",
                    "source": {
                        "hostDir": {
                            "path": "/dev/log"
                        },
...
            "pulp-db": {
                "state": {
                    "running": {
                        "startedAt": "2014-12-16T15:27:04Z"
                    }
                },
                "restartCount": 0,
                "image": "markllama/mongodb",
                "containerID": "docker://8f21d45e49b18b37b98ea7556346095261699bc
3664b52813a533edccee55a63"
            }
        }
    }
}


It's really long. So I'm not going to include it inline. Instead I put it into a gist.

If you fish through it you'll find the same elements I used to create the pod, and lots, lots more.  The structure now contains both a desiredState and a currentState sub-structure, with very different contents.

Now a lot of this is just noise to us, but lines 59-72 are of particular interest.  These show the effects of the Service object that was created previously.  These are the environment variables and network ports declared. These are the values that a client container will use to connect to this service container.

Testing the MongoDB service


If you've read my previous blog post on creating a MongoDB Docker image you'll be familiar with the process I used to verify the basic operation of the service.

In that case I was running the container using Docker on my laptop.  I knew exactly where the container was running and I had direct access to the Docker CLI so that I could ask Docker about my new container.
I'd opened up the MongoDB port and told Docker to bind it to a random port on the host and I could connect directly to that port.

In a Kubernetes cluster there's no way to know a priori where the MongoDB container will end up. You have to ask Kubernetes where it is.  Further you don't have direct access to the Docker CLI.

This is where that publicIPs key in the mongodb-service.json file comes in.  I set the public IP value of the db service to an external IP address of one of the Kubernetes minions: 10.245.2.2.  This causes the proxy on that minion to accept inbound connections and forward them to the db service pods where ever they are.

The minion host is accessible from my desktop so I can test the connectivity directly.

echo "show dbs" | mongo 10.245.2.2
MongoDB shell version: 2.4.6
connecting to: 10.245.2.4/test
local 0.03125GB
bye

And now for QPID?


As with the Service object, creating and testing the QPID container within Kubernetes requires the same process.  Create a JSON file which describes the QPID service and another for the pod.  Submit them and test as before.

Summary


Now I have two running network services inside the Kubernetes cluster. This consists of a Kubernetes Service object and a Kubernetes Pod which is running the image I'd created for each service application.

I can prove to myself that the application services are running and accessible, though for some of the detailed tests I have to go under the covers of Kuberntes still.

I have the information I need to craft images for the other Pulp services so that they can consume the database and messenger services.

Next Up


In the next post I mean to create the first Pulp service image, the Celery Beat server.  There are elements that all of the remaining images will have in common, so I'm going to first build a base image and then apply the last layer to differentiate the beat server from the Pulp resource manager and the pulp workers.

References