Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Minion upgrade to 3006.9 fails on Windows with "specified service already exists" error #67054

Open
jmcmillan89 opened this issue Nov 18, 2024 · 1 comment
Labels
Bug broken, incorrect, or confusing behavior needs-triage

Comments

@jmcmillan89
Copy link

Description
After upgrading our Salt master to 3006.9, we have turned our attention to getting our minions upgraded to 3006.9 as well. We are having difficulties with our Windows minions, as the silent install/upgrade does not complete successfully. When upgrading manually, it errors out at the very end of the installation because the salt-minion service already exists. Just to be clear, when I say upgrade, I mean that we already have the Salt minion installed and running at an older version, and are just installing from a newer Salt minion binary (this has worked for us in the past).

Apologies for submitting this as a bug, but I couldn't find a more appropriate type.

Setup
We have many Windows machines in our organization running various versions of Windows Server -- some as old as 2012 (very few) but most are 2016 or 2019. All of our minions are currently on version 3004.2, and we are attempting a silent upgrade using the following simple Salt state:

Install salt minion 3006.9:
  file.managed:
    - source: https://my-company-artifactory/saltproject-generic-remote/windows/3006.9/Salt-Minion-3006.9-Py3-AMD64-Setup.exe
    - name: "C:\\Windows\\Temp\\Salt-Minion-3006.9-Py3-AMD64-Setup.exe"
    - skip_verify: True
  cmd.run:
    - name: "C:\\Windows\\Temp\\Salt-Minion-3006.9-Py3-AMD64-Setup.exe /S"

And applying with salt -L my-minion.foo.bar state.apply minion.upgrade. For reference, this is the exact same state we used to upgrade our Windows minions to 3004.2, obviously using that binary. It is expected that this state will fail, as the minion restarts during the installation.

After applying this state, if I do a salt -L my-minion.foo.bar grains.get saltversion it is still on 3004.2 -- if I hop onto the machine, and look into C:\Program Files\Salt Project\Salt I can see that the directory structure has been updated -- e.g. there is salt-call.exe instead of salt-call.bat -- but there are also still relics from the old 3004.2 installation, including the bin/ directory:
image

If I inspect the salt-minion service, I can see that it is still pointed to C:\Program Files\Salt Project\Salt\bin\ssm.exe instead of C:\Program Files\Salt Project\Salt\salt-minion.exe like I would expect:
image

To my eyes, it looks as though the minion is in a "half-state" between 3004.2 and 3006.9. So, I uninstalled the Salt minion, re-installed at 3004.2, then manually did an upgrade to 3006.9 (basically, just run the .exe and click through the screens). It almost got to the end of the install when it failed with the following error:
image

It would appear that this binary cannot handle the scenario where the salt-minion service already exists -- in my experience, this was not the case when upgrading to 3004.2. As a result, we are kind of stuck automating the upgrade of our Windows minions as I cannot find a way to workaround this.

For this testing I am using an on-prem machine, and I have also tried using my own laptop and gotten the same result.

Expected behavior
I would expect the Salt-Minion-3006.9-Py3-AMD64-Setup.exe binary to be able to handle the use-case where the salt-minion service already exists, and upgrade/re-configure it accordingly.

One workaround I tried was to script the upgrade by executing a PowerShell script with cmd.run in the background -- so basically, the master will kick off this script and then return immediately, not waiting for it to complete:

# Perform a silent install of the new minion binary
$x = Start-Process -FilePath "C:\Windows\Temp\Salt-Minion-3006.9-Py3-AMD64-Setup.exe" -ArgumentList @("/S") -Wait -PassThru -NoNewWindow

# This does upgrade the minion, HOWEVER it fails to update the SERVICE b/c the service already exists
# These commands set the service properties to what they would be if we did a CLEAN/FRESH install of 3006.9
$x = Start-Process -FilePath "C:\Program Files\Salt Project\Salt\ssm.exe" -ArgumentList @("set salt-minion Application C:\Program Files\Salt Project\Salt\salt-minion.exe") -Wait -PassThru -NoNewWindow
$x = Start-Process -FilePath "C:\Program Files\Salt Project\Salt\ssm.exe" -ArgumentList @('set salt-minion AppParameters -c "C:\ProgramData\Salt Project\Salt\conf" -l quiet') -Wait -PassThru -NoNewWindow
$x = Start-Process -FilePath "C:\Program Files\Salt Project\Salt\ssm.exe" -ArgumentList @("set salt-minion AppDirectory C:\Program Files\Salt Project\Salt") -Wait -PassThru -NoNewWindow

# Stop the minion
$x = Start-Process -FilePath "C:\Program Files\Salt Project\Salt\ssm.exe" -ArgumentList @("stop salt-minion") -Wait -PassThru -NoNewWindow

# Delete the old bin directory
# This is leftovers from the old minion (3004.2) and the service was pointing to files in here
# Now that we have updated the service to use the new files, this directory can be removed
Remove-Item "C:\Program Files\Salt Project\Salt\bin" -Force -Recurse

# Start the minion
$x = Start-Process -FilePath "C:\Program Files\Salt Project\Salt\ssm.exe" -ArgumentList @("start salt-minion") -Wait -PassThru -NoNewWindow

This is not working for me either -- my ssm.exe commands to edit the service configuration are not taking effect, and the script abruptly stops as soon as the salt-minion service is stopped or restarted (I have a hunch that even though the script is running in the background, its parent process is still the Salt minion service, so killing the service kills the script). This is also editing the service while that same service is running, so I don't even know if this is a legitimate approach -- either way, it's not working and as you can see, I'm getting desperate.

Is there a recommended or more graceful way to silently upgrade Windows minions? We simply have too many to upgrade them manually.

@jmcmillan89 jmcmillan89 added Bug broken, incorrect, or confusing behavior needs-triage labels Nov 18, 2024
@jmcmillan89 jmcmillan89 changed the title [BUG] [BUG] Silent upgrade to 3006.9 fails on Windows with "specified service already exists" error Nov 18, 2024
@jmcmillan89 jmcmillan89 changed the title [BUG] Silent upgrade to 3006.9 fails on Windows with "specified service already exists" error [BUG] Minion upgrade to 3006.9 fails on Windows with "specified service already exists" error Nov 18, 2024
@jmcmillan89
Copy link
Author

I've done a bit more testing with this and can say with more (not complete) confidence that this is definitely a bug or regression introduced in 3006.9. I did more Googling and decided to try installing/upgrading via the salt-winrepo-ng package repo -- below is how I configured it:

# Add salt-winrepo-ng as a fileserver
gitfs_remotes:
  - https://github.com/saltstack/salt-winrepo-ng.git:
    - all_saltenvs: master
    - mountpoint: salt://win/repo-ng

# Restart salt master
systemctl restart salt-master

Next, I spun up an EC2 instance running Windows Server 2019, installed version 3004.2 of the minion, and keyed it to my master. I then tried to install (upgrade to) version 3006.9 of the minion using pkg.install:

# Refresh package database on minion
salt -L salt_upgrade_sandbox_windows pkg.refresh_db

# Upgrade
salt -L salt_upgrade_sandbox_windows pkg.install salt-minion-py3 version=3006.9

salt_upgrade_sandbox_windows:
    ----------
    salt-minion-py3:
        ----------
        install status:
            task started 

This returned immediately, so assumed it was running in the background. I waited a few minutes and then did a grains.get saltversion and it returned 3004.2 -- so clearly the minion did not upgrade. I hopped onto my EC2 instance and manually restarted the salt-minion service just in case, and then checked the grains again -- still on version 3004.2.

I went to the remove programs list, and the minion was showing installed as version 3006.9! So it tried to upgrade the minion but I think it got stuck in this half-state again.

So, I uninstalled the minion completely from my EC2 instance, reinstalled version 3004.2 of the minion, re-keyed to my master, only this time I tried to install version 3006.7 of the minion and it worked:

# Refresh package database on minion
salt -L salt_upgrade_sandbox_windows pkg.refresh_db

# Upgrade
salt -L salt_upgrade_sandbox_windows pkg.install salt-minion-py3 version=3006.7

salt_upgrade_sandbox_windows:
    ----------
    salt-minion-py3:
        ----------
        install status:
            task started 

# Wait a few minutes
salt -L salt_upgrade_sandbox_windows grains.get saltversion
salt_upgrade_sandbox_windows:
    3006.7

I decided to do one more test with an even earlier version just to make sure I wasn't crazy:

# I reverted my minion back to 3004.2 again
salt -L salt_upgrade_sandbox_windows grains.get saltversion
salt_upgrade_sandbox_windows:
    3004.2

# Refresh package DB
salt -L salt_upgrade_sandbox_windows pkg.refresh_db
salt_upgrade_sandbox_windows:
    ----------
    failed:
        0
    success:
        312
    total:
        312

# Upgrade to 3006.6
salt -L salt_upgrade_sandbox_windows pkg.install salt-minion-py3 version=3006.6
salt_upgrade_sandbox_windows:
    ----------
    salt-minion-py3:
        ----------
        install status:
            task started

# Check version 1-2 mins later
salt -L salt_upgrade_sandbox_windows grains.get saltversion
salt_upgrade_sandbox_windows:
    3006.6

This time, I was on the VM with the task manager open, and I watched with my own eyes as it installed -- I saw a process with name Salt-Minion-3006.6-Py3-AMD64-Setup.exe show up, and I can now see salt-minion.exe running, and this file doesn't even exist in 3004.2.

I am extremely confident that this is a regression with version 3006.9 of the binary, and I don't have the skills or know-how to fix it -- can someone please take a look, I hate for this to get promoted up to the next release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug broken, incorrect, or confusing behavior needs-triage
Projects
None yet
Development

No branches or pull requests

1 participant