-
Notifications
You must be signed in to change notification settings - Fork 27
Common Error Messages
This page lists commonly-encountered errors with Aeolus. If you encounter problems, you’re encouraged to document them here.
The 0.10.x series of Aeolus has a bug where the admin user sometimes doesn’t get created.
The bug is fixed in newer development code, but for 0.10.x you’ll need to follow these simple steps as a workaround:
Change to the root user:
$ sudo su
Run the following commands:
# export RAILS_ENV=production`
# cd /usr/share/aeolus-conductor
# rake dc:create_admin_user
This should create the admin user for you, so you can log in using admin/password.
There is a high probability your time setting is incorrect.
- wrong timezone
- NTP server not running/being too far from real time.
In case of NTP server being off too much:
# systemctl stop ntpd.service
# ntpdate [YOUR FAVOURITE NTP SERVER, e.g., pool.ntp.org]
# systemctl start ntpd.service
(Make sure /etc/ntp.conf
includes working NTP servers, too. Setting the clock with ntpdate is just a quick short-term measure to bring your clock roughly in sync with reality.)
If you have tried to restart mongod and still get this database, it is
most likely a stale lock file that gets left around if mongod is shut
down uncleanly. Stop mongod (as a formality, at least),
rm /var/lib/mongodb/mongod.lock
, and start again. (Note that a mongod
failure may require that you restart iwhd as well:
service iwhd restart
.)
With an error like the following:
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/imgfac/Builder.py", line 115, in _build_image_from_template
self.os_plugin.create_base_image(self, template, parameters)
File "/usr/lib/python2.7/site-packages/imagefactory_plugins/FedoraOS/FedoraOS.py", line 255, in create_base_image
self._init_oz()
File "/usr/lib/python2.7/site-packages/imagefactory_plugins/FedoraOS/FedoraOS.py", line 237, in _init_oz
self.init_guest()
File "/usr/lib/python2.7/site-packages/imagefactory_plugins/FedoraOS/FedoraOS.py", line 305, in init_guest
raise ImageFactoryException("OS plugin does not support distro (%s) update (%s) in TDL" % (self.tdlobj.distro, self.tdlobj.update) )
ImageFactoryException: OS plugin does not support distro (Fedora) update (16) in TDL
First, take the error at face value and see if Oz supports the distro
and update in question (Fedora 16 in the above). Running oz-install
with no arguments will list all supported distros.
Assuming your distro and version is in the supported list, this probably
means that your server does not actually support virtualization. In
my case, service libvirtd status
showed that the service was dead;
starting it moved me along, but image builds then failed with this
error:
libvirtError: internal error process exited while connecting to monitor: Could not access KVM kernel module: Permission denied
failed to initialize KVM: Permission denied
No accelerator found!
This particular machine claims the vmx CPU flag, but it possibly has hardware virtualization disabled in the BIOS, or is similarly in a nutty state.
An error such as the following indicates a problem with your certificate:
2011-07-27 11:37:18,890 DEBUG imagefactory.builders.BaseBuilder.FedoraBuilder pid(7146) Message: Exception caught in ImageFactory
2011-07-27 11:37:18,893 DEBUG imagefactory.builders.BaseBuilder.FedoraBuilder pid(7146) Message: Traceback (most recent call last):
File "/usr/lib/python2.6/site-packages/imagefactory/builders/FedoraBuilder.py", line 499, in push_image
self.push_image_snapshot(target_image_id, provider, credentials)
File "/usr/lib/python2.6/site-packages/imagefactory/builders/FedoraBuilder.py", line 511, in push_image_snapshot
self.push_image_snapshot_ec2(target_image_id, provider, credentials)
File "/usr/lib/python2.6/site-packages/imagefactory/builders/FedoraBuilder.py", line 810, in push_image_snapshot_ec2
stdout, stderr, retcode = self.guest.guest_execute_command(guestaddr, command)
File "/usr/lib/python2.6/site-packages/oz/RedHat.py", line 365, in guest_execute_command
"root@" + guestaddr, command])
File "/usr/lib/python2.6/site-packages/oz/Guest.py", line 77, in subprocess_check_output
stderr))
OzException: 'ssh -i /tmp/tmpDxALEm -o ServerAliveInterval=30 -o StrictHostKeyChecking=no -o ConnectTimeout=10 -o UserKnownHostsFile=/dev/null -o PasswordAuthentication=no [email protected] euca-bundle-vol -c /tmp/tmp4XZlqz -k /tmp/tmpnYntjp -u 0101-1998-2923 -e /mnt,/tmp,/root/.ssh --arch x86_64 -d /mnt/bundles --kernel aki-427d952b -p db231b1a-6ef1-4a61-b253-8fc6024c8564 -s 10240 --ec2cert /tmp/cert-ec2.pem --fstab /etc/fstab -v /' failed(1): Warning: Permanently added 'ec2-50-17-152-241.compute-1.amazonaws.com,50.17.152.241' (RSA) to the list of known hosts.
1+0 records in
1+0 records out
1048576 bytes (1.0 MB) copied, 0.00356732 s, 294 MB/s
mke2fs 1.41.12 (17-May-2010)
Traceback (most recent call last):
File "/usr/bin/euca-bundle-vol", line 492, in <module>
main()
File "/usr/bin/euca-bundle-vol", line 467, in main
ancestor_ami_ids,
File "/usr/lib/python2.7/site-packages/euca2ools/__init__.py", line 994, in generate_manifest
user_pub_key = X509.load_cert(cert_path).get_pubkey().get_rsa()
File "/usr/lib64/python2.7/site-packages/M2Crypto/X509.py", line 611, in load_cert
return load_cert_bio(bio)
File "/usr/lib64/python2.7/site-packages/M2Crypto/X509.py", line 639, in load_cert_bio
raise X509Error(Err.get_error())
M2Crypto.X509.X509Error: 140144859658016:error:0906D06C:PEM routines:PEM_read_bio:no start line:pem_lib.c:698:Expecting: CERTIFICATE
The last line is most informative here in indicating a PEM error, but note that the exception began several lines prior.
The most common cause of this error is that the keys are input
incorrectly in Conductor. Your “Key” should be the .pk
file from
Amazon, and the “Certificate” should contain your .cert
file. Please
see this documentation if
more information is needed.
2011-07-27 05:47:25,596 DEBUG imagefactory.builders.BaseBuilder.FedoraBuilder pid(2650) Message: Exception caught in ImageFactory
2011-07-27 05:47:25,596 DEBUG imagefactory.builders.BaseBuilder.FedoraBuilder pid(2650) Message: Traceback (most recent call last):
File "/usr/lib/python2.6/site-packages/imagefactory/builders/FedoraBuilder.py", line 124, in build_image
self.build_upload(build_id)
File "/usr/lib/python2.6/site-packages/imagefactory/builders/FedoraBuilder.py", line 172, in build_upload
libvirt_xml = self.guest.install(self.app_config["timeout"])
File "/usr/lib/python2.6/site-packages/oz/Guest.py", line 1154, in install
return self.do_install(timeout, force, 0)
File "/usr/lib/python2.6/site-packages/oz/Guest.py", line 1139, in do_install
self.wait_for_install_finish(dom, timeout)
File "/usr/lib/python2.6/site-packages/oz/Guest.py", line 451, in wait_for_install_finish
raise oz.OzException.OzException("No disk activity in %d seconds, failing" % (inactivity_timeout))
OzException: No disk activity in 300 seconds, failing
Oz will fail a build if it has run for 300 seconds without any disk activity, as this indicates that something has gone wrong. This error can indicate a number of things, but the most common problem is a virtual network problem.
When this exception is raised, a screenshot of the guest should be saved
before the instance is terminated. Although the location is
configurable, the default location is directly in /
on the parent host
running Oz. It will a PNG screenshot showing the install process.
The most commonly-encountered error shown in the screenshot is something
to the effect of “ERROR reading package metadata: Cannot retrieve
repository metadata (repomd.xml)”. Most of the time, this is because the
virtual machine the guest is running inside of cannot reach the network
to obtain a mirror. You should ensure that virtual machines are able to
reach the network via a bridged interface. For many users,
service libvirtd restart
has cleared the problem.
It is also possible, though it happens much less frequently, that whatever repository you are pointing to is actually unavailable or corrupt.
This error, seen in /var/log/imagefactory.log
, could be caused by a
variety of things, if not the above problem. Try finding the PNG
screenshot. It’s often written directly to /
, but you can also try to
find it with lsof -p <pid_of_imagefactory>
.
In these cases, an image has build successfully, but errors are encountered when trying to push it to the provider. (But prior to launching the image.)
This error, seen in /var/log/imagefactory.log
:
2011-06-23 09:49:59,444 DEBUG imagefactory.ImageWarehouse.ImageWarehouse pid(30198) Message: Querying (htt
p://localhost:9090/target_images/_query) with expression ($build == "55c8c590-2b0c-4a2e-b128-e0f96a286f00"
&& $target == "condorcloud")
2011-06-23 09:49:59,446 DEBUG imagefactory.ImageWarehouse.ImageWarehouse pid(30198) Message: Getting metad
ata (['template']) from http://localhost:9090/target_images/None
2011-06-23 09:49:59,448 DEBUG imagefactory.ImageWarehouse.ImageWarehouse pid(30198) Message: Created Image
Warehouse instance http://localhost:9090 - buckets(target_images, templates, icicles, provider_images)
2011-06-23 09:49:59,448 ERROR imagefactory.qmfagent.ImageFactoryAgent.ImageFactoryAgent pid(30198) Message
: 'template' must be a UUID, URL, XML string or XML document path...
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/imagefactory/qmfagent/ImageFactoryAgent.py", line 74, in method
result = getattr(target_obj, methodName)(**args)
File "/usr/lib/python2.7/site-packages/imagefactory/qmfagent/ImageFactory.py", line 125, in push_image
return BuildDispatcher().push_image_to_providers(image, build, providers, credentials, BuildAdaptor, s
elf.agent)
File "/usr/lib/python2.7/site-packages/imagefactory/BuildDispatcher.py", line 78, in push_image_to_provi
ders
job = job_cls(template, target, image_id, build_id, *args, **kwargs)
File "/usr/lib/python2.7/site-packages/imagefactory/qmfagent/BuildAdaptor.py", line 76, in __init__
super(BuildAdaptor, self).__init__(template, target, image_id, build_id)
File "/usr/lib/python2.7/site-packages/imagefactory/BuildJob.py", line 42, in __init__
self.template = template if isinstance(template, Template) else Template(template)
File "/usr/lib/python2.7/site-packages/imagefactory/Template.py", line 83, in __init__
raise ValueError("'template' must be a UUID, URL, XML string or XML document path...")
ValueError: 'template' must be a UUID, URL, XML string or XML document path...
With this error, it’s important to read back a bit. It rarely (if ever)
indicates the template as the root problem. Reading back, note that the
URL we try to fetch is http://localhost:9090/target_images/None
.
Scrolling up slightly further, the first lines indicate
$target == "condorcloud"
. If you’re not using Condor Cloud as a
backend, this is an error!
Currently, the code is structured so that condorcloud is the fallback
provider, so this typically indicates that the provider name you tried
to push to does not exist. For example, this error was recently
encountered while trying to push to a vSphere provider named “vmware2”
in Conductor and used on the command line, but /etc/vmware.json
used
“vmware” as the name, so the push failed. You may need to dig a little
bit if this isn’t exactly what you were doing, but this error typically
points to some sort of problem with the provider name you are
specifying.
If the “condorcloud” portion is missing, it could be another problem, such as a build that was deleted before pushing (rare), a full disk, or attempting to push before the build has finished.
Names of mock providers must begin with the string “mock” . So, “mock” or “mock provider” are valid names for mock providers, but “Mock” or “my mock provider” are not.
Create_Instance_Failure: Failed to perform transfer: Server returned nothing (no headers, no data)
This error typically indicates that deltacloudd timed out before any response could be sent. This often indicates a problem with the remote cloud provider. For example, deltacloudd may time out the connection before vSphere can create an instance if it vSphere has to copy the base image before launch and is using an NFS mount to do so. In the short term, you can try to restart deltacloudd with a longer timeout. Long term, we’re working on improving handling here so that this won’t be a problem.
With single deltacloud-core, when adding a provider manually through conductor, make sure you have entered a value for X-Deltacloud-Provider that matches the ec2 realm you plan to use. “us-east-1” and “us-west-1” are the values configure uses to setup the default ec2 providers.
If X-Deltacloud-Provider is not set, you will see this error when you attempt to create an ec2 provider account.
W, [2011-10-03T14:56:10.570864 #1222] WARN -- : ##### Aws::Ec2 request: :443/?AWSAccessKeyId=blah
&Action=DescribeAvailabilityZones&SignatureMethod=HmacSHA256&SignatureVersion=2&Timestamp=
2011-10-03T21%3A56%3A10.000Z&Version=2010-08-31&Signature=aJUyhDLHhESPAg8YFZnKii1W7wp0mTWR95epcpdR7Ac%
3D ####
I, [2011-10-03T14:56:10.571143 #1222] INFO -- : Closing HTTPS connection to :443, reason: 'Aws::AWSEr
rorHandler: code: 403: 'Forbidden', probability: 10%'
I, [2011-10-03T14:56:22.002211 #1222] INFO -- : New Aws::Ec2 using per_thread-connection mode
I, [2011-10-03T14:56:22.002928 #1222] INFO -- : Opening new HTTPS connection to :443
W, [2011-10-03T14:56:22.022209 #1222] WARN -- : ##### Aws::Ec2 returned an error: 403 Forbidden
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
<head>
<title>Test Page for the Apache HTTP Server on Fedora</title>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<style type="text/css">
This sometimes comes up when starting Conductor, typically via rails server
>> Listening on 0.0.0.0:3000, CTRL+C to stop
Exiting
/home/mawagner/.rvm/gems/ruby-1.8.7-p357@conductor-latejuly/gems/eventmachine-0.12.10/lib/eventmachine.rb:572:in `start_tcp_server': no acceptor (RuntimeError)
from /home/mawagner/.rvm/gems/ruby-1.8.7-p357@conductor-latejuly/gems/eventmachine-0.12.10/lib/eventmachine.rb:572:in `start_server'
from /home/mawagner/.rvm/gems/ruby-1.8.7-p357@conductor-latejuly/gems/thin-1.3.1/lib/thin/backends/tcp_server.rb:16:in `connect'
...
Despite the cryptic error, this has a simple cause: something is already
listening on that port, most likely the RPM version of Conductor. Stop
it with sudo service aeolus-conductor stop
or see what’s running on
the port with sudo lsof -i :3000
and stop that.
If a new checkout fails to start with this error, it’s because the converge-ui submodule needs to be configured.
From the base of the checkout, run git submodule init
and then
git submodule update
. Then you’ll be all set.
This appears to be a bug with REE. Use normal Ruby.
See if you have both a Gemfile and a Gemfile.in. You should not.