Skip to content

Common Error Messages

danmacpherson edited this page Feb 7, 2013 · 6 revisions

This page lists commonly-encountered errors with Aeolus. If you encounter problems, you’re encouraged to document them here.

Conductor

I can’t log in with Aeolus 0.10.x, no admin user was created

The 0.10.x series of Aeolus has a bug where the admin user sometimes doesn’t get created.

The bug is fixed in newer development code, but for 0.10.x you’ll need to follow these simple steps as a workaround:

Change to the root user:

$ sudo su

Run the following commands:

# export RAILS_ENV=production`
# cd /usr/share/aeolus-conductor
# rake dc:create_admin_user

This should create the admin user for you, so you can log in using admin/password.

Repeatedly can’t login to Conductor, keep getting logged out immediately after login.

There is a high probability your time setting is incorrect.

  • wrong timezone
  • NTP server not running/being too far from real time.

In case of NTP server being off too much:

# systemctl stop ntpd.service
# ntpdate [YOUR FAVOURITE NTP SERVER, e.g., pool.ntp.org]
# systemctl start ntpd.service

(Make sure /etc/ntp.conf includes working NTP servers, too. Setting the clock with ntpdate is just a quick short-term measure to bring your clock roughly in sync with reality.)

Components Fail to Start

mongod dead but subsys locked

If you have tried to restart mongod and still get this database, it is most likely a stale lock file that gets left around if mongod is shut down uncleanly. Stop mongod (as a formality, at least), rm /var/lib/mongodb/mongod.lock, and start again. (Note that a mongod failure may require that you restart iwhd as well: service iwhd restart.)

Images Fail to Build

ImageFactoryException: OS plugin does not support distro (Fedora) update (16) in TDL

With an error like the following:

Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/imgfac/Builder.py", line 115, in _build_image_from_template
self.os_plugin.create_base_image(self, template, parameters)
File "/usr/lib/python2.7/site-packages/imagefactory_plugins/FedoraOS/FedoraOS.py", line 255, in create_base_image
self._init_oz()
File "/usr/lib/python2.7/site-packages/imagefactory_plugins/FedoraOS/FedoraOS.py", line 237, in _init_oz
self.init_guest()
File "/usr/lib/python2.7/site-packages/imagefactory_plugins/FedoraOS/FedoraOS.py", line 305, in init_guest
raise ImageFactoryException("OS plugin does not support distro (%s) update (%s) in TDL" % (self.tdlobj.distro, self.tdlobj.update) )
ImageFactoryException: OS plugin does not support distro (Fedora) update (16) in TDL

First, take the error at face value and see if Oz supports the distro and update in question (Fedora 16 in the above). Running oz-install with no arguments will list all supported distros.

Assuming your distro and version is in the supported list, this probably means that your server does not actually support virtualization. In my case, service libvirtd status showed that the service was dead; starting it moved me along, but image builds then failed with this error:

libvirtError: internal error process exited while connecting to monitor: Could not access KVM kernel module: Permission denied
failed to initialize KVM: Permission denied
No accelerator found!

This particular machine claims the vmx CPU flag, but it possibly has hardware virtualization disabled in the BIOS, or is similarly in a nutty state.

M2Crypto.X509.X509Error

An error such as the following indicates a problem with your certificate:

2011-07-27 11:37:18,890 DEBUG imagefactory.builders.BaseBuilder.FedoraBuilder pid(7146) Message: Exception caught in ImageFactory
2011-07-27 11:37:18,893 DEBUG imagefactory.builders.BaseBuilder.FedoraBuilder pid(7146) Message: Traceback (most recent call last):
  File "/usr/lib/python2.6/site-packages/imagefactory/builders/FedoraBuilder.py", line 499, in push_image
    self.push_image_snapshot(target_image_id, provider, credentials)
  File "/usr/lib/python2.6/site-packages/imagefactory/builders/FedoraBuilder.py", line 511, in push_image_snapshot
    self.push_image_snapshot_ec2(target_image_id, provider, credentials)
  File "/usr/lib/python2.6/site-packages/imagefactory/builders/FedoraBuilder.py", line 810, in push_image_snapshot_ec2
    stdout, stderr, retcode = self.guest.guest_execute_command(guestaddr, command)
  File "/usr/lib/python2.6/site-packages/oz/RedHat.py", line 365, in guest_execute_command
    "root@" + guestaddr, command])
  File "/usr/lib/python2.6/site-packages/oz/Guest.py", line 77, in subprocess_check_output
    stderr))
OzException: 'ssh -i /tmp/tmpDxALEm -o ServerAliveInterval=30 -o StrictHostKeyChecking=no -o ConnectTimeout=10 -o UserKnownHostsFile=/dev/null -o PasswordAuthentication=no [email protected] euca-bundle-vol -c /tmp/tmp4XZlqz -k /tmp/tmpnYntjp -u 0101-1998-2923 -e /mnt,/tmp,/root/.ssh --arch x86_64 -d /mnt/bundles --kernel aki-427d952b -p db231b1a-6ef1-4a61-b253-8fc6024c8564 -s 10240 --ec2cert /tmp/cert-ec2.pem --fstab /etc/fstab -v /' failed(1): Warning: Permanently added 'ec2-50-17-152-241.compute-1.amazonaws.com,50.17.152.241' (RSA) to the list of known hosts.
1+0 records in
1+0 records out
1048576 bytes (1.0 MB) copied, 0.00356732 s, 294 MB/s
mke2fs 1.41.12 (17-May-2010)
Traceback (most recent call last):
  File "/usr/bin/euca-bundle-vol", line 492, in <module>
    main()
  File "/usr/bin/euca-bundle-vol", line 467, in main
    ancestor_ami_ids,
  File "/usr/lib/python2.7/site-packages/euca2ools/__init__.py", line 994, in generate_manifest
    user_pub_key = X509.load_cert(cert_path).get_pubkey().get_rsa()
  File "/usr/lib64/python2.7/site-packages/M2Crypto/X509.py", line 611, in load_cert
    return load_cert_bio(bio)
  File "/usr/lib64/python2.7/site-packages/M2Crypto/X509.py", line 639, in load_cert_bio
    raise X509Error(Err.get_error())
M2Crypto.X509.X509Error: 140144859658016:error:0906D06C:PEM routines:PEM_read_bio:no start line:pem_lib.c:698:Expecting: CERTIFICATE

The last line is most informative here in indicating a PEM error, but note that the exception began several lines prior.

The most common cause of this error is that the keys are input incorrectly in Conductor. Your “Key” should be the .pk file from Amazon, and the “Certificate” should contain your .cert file. Please see this documentation if more information is needed.

OzException: No disk activity in 300 seconds, failing

2011-07-27 05:47:25,596 DEBUG imagefactory.builders.BaseBuilder.FedoraBuilder pid(2650) Message: Exception caught in ImageFactory
2011-07-27 05:47:25,596 DEBUG imagefactory.builders.BaseBuilder.FedoraBuilder pid(2650) Message: Traceback (most recent call last):
  File "/usr/lib/python2.6/site-packages/imagefactory/builders/FedoraBuilder.py", line 124, in build_image
    self.build_upload(build_id)
  File "/usr/lib/python2.6/site-packages/imagefactory/builders/FedoraBuilder.py", line 172, in build_upload
    libvirt_xml = self.guest.install(self.app_config["timeout"])
  File "/usr/lib/python2.6/site-packages/oz/Guest.py", line 1154, in install
    return self.do_install(timeout, force, 0)
  File "/usr/lib/python2.6/site-packages/oz/Guest.py", line 1139, in do_install
    self.wait_for_install_finish(dom, timeout)
  File "/usr/lib/python2.6/site-packages/oz/Guest.py", line 451, in wait_for_install_finish
    raise oz.OzException.OzException("No disk activity in %d seconds, failing" % (inactivity_timeout))
OzException: No disk activity in 300 seconds, failing

Oz will fail a build if it has run for 300 seconds without any disk activity, as this indicates that something has gone wrong. This error can indicate a number of things, but the most common problem is a virtual network problem.

When this exception is raised, a screenshot of the guest should be saved before the instance is terminated. Although the location is configurable, the default location is directly in / on the parent host running Oz. It will a PNG screenshot showing the install process.

The most commonly-encountered error shown in the screenshot is something to the effect of “ERROR reading package metadata: Cannot retrieve repository metadata (repomd.xml)”. Most of the time, this is because the virtual machine the guest is running inside of cannot reach the network to obtain a mirror. You should ensure that virtual machines are able to reach the network via a bridged interface. For many users, service libvirtd restart has cleared the problem.

It is also possible, though it happens much less frequently, that whatever repository you are pointing to is actually unavailable or corrupt.

raise OzException.OzException(“Timed out waiting for install to finish”)

This error, seen in /var/log/imagefactory.log, could be caused by a variety of things, if not the above problem. Try finding the PNG screenshot. It’s often written directly to /, but you can also try to find it with lsof -p <pid_of_imagefactory>.

Images Fail to Push

In these cases, an image has build successfully, but errors are encountered when trying to push it to the provider. (But prior to launching the image.)

ValueError: ‘template’ must be a UUID, URL, XML string or XML document path…

This error, seen in /var/log/imagefactory.log:

2011-06-23 09:49:59,444 DEBUG imagefactory.ImageWarehouse.ImageWarehouse pid(30198) Message: Querying (htt
p://localhost:9090/target_images/_query) with expression ($build == "55c8c590-2b0c-4a2e-b128-e0f96a286f00"
 && $target == "condorcloud")
2011-06-23 09:49:59,446 DEBUG imagefactory.ImageWarehouse.ImageWarehouse pid(30198) Message: Getting metad
ata (['template']) from http://localhost:9090/target_images/None
2011-06-23 09:49:59,448 DEBUG imagefactory.ImageWarehouse.ImageWarehouse pid(30198) Message: Created Image
 Warehouse instance http://localhost:9090 - buckets(target_images, templates, icicles, provider_images)
2011-06-23 09:49:59,448 ERROR imagefactory.qmfagent.ImageFactoryAgent.ImageFactoryAgent pid(30198) Message
: 'template' must be a UUID, URL, XML string or XML document path...
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/imagefactory/qmfagent/ImageFactoryAgent.py", line 74, in method
    result = getattr(target_obj, methodName)(**args)
  File "/usr/lib/python2.7/site-packages/imagefactory/qmfagent/ImageFactory.py", line 125, in push_image
    return BuildDispatcher().push_image_to_providers(image, build, providers, credentials, BuildAdaptor, s
elf.agent)
  File "/usr/lib/python2.7/site-packages/imagefactory/BuildDispatcher.py", line 78, in push_image_to_provi
ders
    job = job_cls(template, target, image_id, build_id, *args, **kwargs)
  File "/usr/lib/python2.7/site-packages/imagefactory/qmfagent/BuildAdaptor.py", line 76, in __init__
    super(BuildAdaptor, self).__init__(template, target, image_id, build_id)
  File "/usr/lib/python2.7/site-packages/imagefactory/BuildJob.py", line 42, in __init__
    self.template = template if isinstance(template, Template) else Template(template)
  File "/usr/lib/python2.7/site-packages/imagefactory/Template.py", line 83, in __init__
    raise ValueError("'template' must be a UUID, URL, XML string or XML document path...")
ValueError: 'template' must be a UUID, URL, XML string or XML document path...

With this error, it’s important to read back a bit. It rarely (if ever) indicates the template as the root problem. Reading back, note that the URL we try to fetch is http://localhost:9090/target_images/None. Scrolling up slightly further, the first lines indicate $target == "condorcloud". If you’re not using Condor Cloud as a backend, this is an error!

Currently, the code is structured so that condorcloud is the fallback provider, so this typically indicates that the provider name you tried to push to does not exist. For example, this error was recently encountered while trying to push to a vSphere provider named “vmware2” in Conductor and used on the command line, but /etc/vmware.json used “vmware” as the name, so the push failed. You may need to dig a little bit if this isn’t exactly what you were doing, but this error typically points to some sort of problem with the provider name you are specifying.

If the “condorcloud” portion is missing, it could be another problem, such as a build that was deleted before pushing (rare), a full disk, or attempting to push before the build has finished.

Pushing with mock provider

Names of mock providers must begin with the string “mock” . So, “mock” or “mock provider” are valid names for mock providers, but “Mock” or “my mock provider” are not.

Deployments Fail to Launch

Create_Instance_Failure: Failed to perform transfer: Server returned nothing

Create_Instance_Failure: Failed to perform transfer: Server returned nothing (no headers, no data)

This error typically indicates that deltacloudd timed out before any response could be sent. This often indicates a problem with the remote cloud provider. For example, deltacloudd may time out the connection before vSphere can create an instance if it vSphere has to copy the base image before launch and is using an NFS mount to do so. In the short term, you can try to restart deltacloudd with a longer timeout. Long term, we’re working on improving handling here so that this won’t be a problem.

Error Adding Provider Accounts

EC2 and Single Deltacloud-Core

With single deltacloud-core, when adding a provider manually through conductor, make sure you have entered a value for X-Deltacloud-Provider that matches the ec2 realm you plan to use. “us-east-1” and “us-west-1” are the values configure uses to setup the default ec2 providers.

If X-Deltacloud-Provider is not set, you will see this error when you attempt to create an ec2 provider account.

W, [2011-10-03T14:56:10.570864 #1222]  WARN -- : ##### Aws::Ec2 request: :443/?AWSAccessKeyId=blah
&Action=DescribeAvailabilityZones&SignatureMethod=HmacSHA256&SignatureVersion=2&Timestamp=
2011-10-03T21%3A56%3A10.000Z&Version=2010-08-31&Signature=aJUyhDLHhESPAg8YFZnKii1W7wp0mTWR95epcpdR7Ac%
3D ####
I, [2011-10-03T14:56:10.571143 #1222]  INFO -- : Closing HTTPS connection to :443, reason: 'Aws::AWSEr
rorHandler: code: 403: 'Forbidden', probability: 10%'
I, [2011-10-03T14:56:22.002211 #1222]  INFO -- : New Aws::Ec2 using per_thread-connection mode
I, [2011-10-03T14:56:22.002928 #1222]  INFO -- : Opening new HTTPS connection to :443
W, [2011-10-03T14:56:22.022209 #1222]  WARN -- : ##### Aws::Ec2 returned an error: 403 Forbidden
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">

<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
        <head>
                <title>Test Page for the Apache HTTP Server on Fedora</title>
                <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
                <style type="text/css">

Error starting Conductor

This sometimes comes up when starting Conductor, typically via rails server

>> Listening on 0.0.0.0:3000, CTRL+C to stop
Exiting
/home/mawagner/.rvm/gems/ruby-1.8.7-p357@conductor-latejuly/gems/eventmachine-0.12.10/lib/eventmachine.rb:572:in `start_tcp_server': no acceptor (RuntimeError)
    from /home/mawagner/.rvm/gems/ruby-1.8.7-p357@conductor-latejuly/gems/eventmachine-0.12.10/lib/eventmachine.rb:572:in `start_server'
    from /home/mawagner/.rvm/gems/ruby-1.8.7-p357@conductor-latejuly/gems/thin-1.3.1/lib/thin/backends/tcp_server.rb:16:in `connect'
...

Despite the cryptic error, this has a simple cause: something is already listening on that port, most likely the RPM version of Conductor. Stop it with sudo service aeolus-conductor stop or see what’s running on the port with sudo lsof -i :3000 and stop that.

Missing layout app/views/layouts/converge-ui/shell_layout

If a new checkout fails to start with this error, it’s because the converge-ui submodule needs to be configured.

From the base of the checkout, run git submodule init and then git submodule update. Then you’ll be all set.

Test failures

undefined method `to_r’ for 36300.0324699879:Float

This appears to be a bug with REE. Use normal Ruby.

undefined method `name’ for rest-client

See if you have both a Gemfile and a Gemfile.in. You should not.

Clone this wiki locally