Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Setting JVM Options Prevents Elasticsearch from Loading #22

Closed
mikeselander opened this issue Apr 7, 2020 · 14 comments · Fixed by #23
Closed

Setting JVM Options Prevents Elasticsearch from Loading #22

mikeselander opened this issue Apr 7, 2020 · 14 comments · Fixed by #23
Assignees

Comments

@mikeselander
Copy link

Elasticsearch will not start up any longer when setting any JVM options. It doesn't seem to matter what JVM options are set, the act of setting them at all prevents Elasticsearch from loading in the same manner as #20.

For good measure, I also tested this by setting only the chassis/chassis_elasticsearch extension on a Chassis Quickstart install and it also failed.

The following configurations do and don't work:

Works:

php: "7.2"
paths:
    base: ..
    wp: wordpress
    content: content
multisite: true
extensions:
    - humanmade/platform_chassis_extension
elasticsearch:
    repo_version: 6.x
    version: 6.3.1
    plugins: [analysis-icu, ingest-attachment]
hosts:
    - chassis-experiments.local
machine_name: chassis-experiments.local

Does not work:

php: "7.2"
paths:
    base: ..
    wp: wordpress
    content: content
multisite: true
extensions:
    - humanmade/platform_chassis_extension
elasticsearch:
    repo_version: 6.x
    version: 6.3.1
    plugins: [analysis-icu, ingest-attachment]
    jvm_options:
        - "-Xms2024m"
        - "-Xmx2024m"
hosts:
    - chassis-experiments.local
machine_name: chassis-experiments.local

For this next one, I figured that setting the new JVM options might solve the problem, but I got no such luck.

Does not work:

php: "7.2"
paths:
    base: ..
    wp: wordpress
    content: content
multisite: true
extensions:
    - humanmade/platform_chassis_extension
elasticsearch:
    repo_version: 6.x
    version: 6.3.1
    plugins: [analysis-icu, ingest-attachment]
    jvm_options:
        - "-Xms2024m"
        - "-Xmx2024m"
        - "8:-XX:NumberOfGCLogFiles=32"
        - "8:-XX:GCLogFileSize=64m"
        - "8:-XX:+UseGCLogFileRotation"
        - "8:-XX:+PrintTenuringDistribution"
        - "8:-XX:+PrintGCDateStamps"
        - "8:-XX:+PrintGCApplicationStoppedTime"
        - "8:-XX:+UseConcMarkSweepGC"
        - "8:-XX:+UseCMSInitiatingOccupancyOnly"
        - "11:-XX:+UseG1GC"
        - "-XX:+UseG1GC"
        - "11:-XX:InitiatingHeapOccupancyPercent=75"
hosts:
    - chassis-experiments.local
machine_name: chassis-experiments.local
  1. What operating system do you use?
    OSX Catalina

  2. What version of Vagrant are you running?
    2.2.7

  3. Are you using VirtualBox or VMWare and which version are you using?
    Virtualbox 6.1

@BronsonQuick
Copy link
Member

Hey @mikeselander, thanks again for the bug report Sir! I can confirm that I can replicate this. I'm actually not sure if the custom JVM options ever worked as I didn't write the original extension. I can have a look at getting them to work though!

@BronsonQuick BronsonQuick self-assigned this Apr 8, 2020
@mikeselander
Copy link
Author

I can have a look at getting them to work though!

That would be amazing, thank you!

I also have a question related to these/YAML question:
Are these two YAML collection formats accepted the same way into the JVM options? From my cursory research it looks like they should be equivalent but I'm curious if you know:

jvm_options:
   - "-Xms2024m"
   - "-Xmx2024m"
jvm_options: ["-Xms2024m", "-Xmx2024m"]

@BronsonQuick
Copy link
Member

Yeah that's correct about the syntax!

I've just started doing some digging. A cat /var/log/syslog shows "Unrecognized VM option 'PrintGCDateStamps'" so I'll start researching which JVM options are for each Java version and sort them out.

@BronsonQuick
Copy link
Member

BronsonQuick commented Apr 8, 2020

Alrighty, so here's what I've discovered.

@roborourke has used a deep_merge so when you set any jvm_options you need to pass in the defaults he has in this plugin then you wanna override those with the ones you want. So for your example you'd go with this:

php: "7.2"
extensions:
  - chassis/chassis_elasticsearch
elasticsearch:
  repo_version: 6.x
  version: 6.3.1
  plugins: [analysis-icu, ingest-attachment]
  jvm_options:
    - '-Xms2048m'
    - '-Xmx2048m'
    - '-XX:+UseG1GC'
    - '8:-XX:NumberOfGCLogFiles=32'
    - '8:-XX:GCLogFileSize=64m'
    - '8:-XX:+UseGCLogFileRotation'
    - '8:-XX:+PrintTenuringDistribution'
    - '8:-XX:+PrintGCDateStamps'
    - '8:-XX:+PrintGCApplicationStoppedTime'
    - '8:-XX:+UseConcMarkSweepGC'
    - '8:-XX:+UseCMSInitiatingOccupancyOnly'
    - '11:-XX:InitiatingHeapOccupancyPercent=75'
hosts:
  - chassis-experiments.local
machine_name: chassis-experiments.local

However, if you only do that then you're still not gonna get a provision to work successfully because the default VM memory in Chassis is only 1024M so the provision will fail and a cat /var/log/syslog will contain There is insufficient memory for the Java Runtime Environment to continue.

If you do the following then you should get a successful provision:

php: "7.2"
extensions:
  - chassis/chassis_elasticsearch
elasticsearch:
  repo_version: 6.x
  version: 6.3.1
  plugins: [analysis-icu, ingest-attachment]
  jvm_options:
    - '-Xms2048m'
    - '-Xmx2048m'
    - '-XX:+UseG1GC'
    - '8:-XX:NumberOfGCLogFiles=32'
    - '8:-XX:GCLogFileSize=64m'
    - '8:-XX:+UseGCLogFileRotation'
    - '8:-XX:+PrintTenuringDistribution'
    - '8:-XX:+PrintGCDateStamps'
    - '8:-XX:+PrintGCApplicationStoppedTime'
    - '8:-XX:+UseConcMarkSweepGC'
    - '8:-XX:+UseCMSInitiatingOccupancyOnly'
    - '11:-XX:InitiatingHeapOccupancyPercent=75'
hosts:
  - chassis-experiments.local
machine_name: chassis-experiments.local
virtualbox:
  memory: 2048

N.B. You'll need to do a vagrant reload if you use that configuration with an existing machine as vagrant provision won't increase the memory usage.

@roborourke
Copy link
Contributor

Isn’t the point of a deep_merge that you’d only need to pass in overrides? I’m sure that was working so are we positive it wasn’t just the available VM memory that was the issue?

In addition you should ideally give the VM twice as much much memory as you give to ES as it still has to run all the other services

@rmccue
Copy link
Contributor

rmccue commented Apr 8, 2020

I'm going to dig into this today FYI to try and see what changed, as the extension was definitely working previously.

@roborourke
Copy link
Contributor

It was, the changes initially were just to hide some warnings I think. Fwiw a clean install and provision of Altis v3 works fine so I’m hoping it was just the VM memory being the issue

@rmccue
Copy link
Contributor

rmccue commented Apr 8, 2020

Isn’t the point of a deep_merge that you’d only need to pass in overrides? I’m sure that was working so are we positive it wasn’t just the available VM memory that was the issue?

The issue is def the unrecognised parameters in the logs we have; specifically: Unrecognized VM option 'PrintGCDateStamps'.

deep_merge does only pass in the overrides, but you have to specify each of those options, as otherwise there'd be no way to remove anything from that array.


I can confirm that the following configuration (Altis' default) works just fine:

elasticsearch:
    plugins: [analysis-icu, ingest-attachment]

As @mikeselander notes, setting the jvm_options breaks it.

However, the bug appears to be a bit more insidious than one might think. What's actually happening is that ES appears to be adding defaults into the options. The custom settings provided by this extension with no overrides generate this file at /etc/elasticsearch/es/jvm.options:

# This file is managed by Puppet -- es
#
# Set the 'jvm_options' parameter on the elasticsearch class to change this file.

-Dfile.encoding=UTF-8
-Dio.netty.noKeySetOptimization=true
-Dio.netty.noUnsafe=true
-Dio.netty.recycler.maxCapacityPerThread=0
-Djava.awt.headless=true
-Djna.nosys=true
-Dlog4j.shutdownHookEnabled=false
-Dlog4j2.disable.jmx=true
-XX:+AlwaysPreTouch
-XX:+HeapDumpOnOutOfMemoryError
-XX:+PrintGCDetails
-XX:+UseG1GC
-XX:-OmitStackTraceInFastThrow
-XX:CMSInitiatingOccupancyFraction=75
-Xloggc:/var/log/elasticsearch/es/gc.log
-Xms256m
-Xmx256m
-Xss1m
-server
11:-XX:InitiatingHeapOccupancyPercent=75
8:-XX:+PrintGCApplicationStoppedTime
8:-XX:+PrintGCDateStamps
8:-XX:+PrintTenuringDistribution
8:-XX:+UseCMSInitiatingOccupancyOnly
8:-XX:+UseConcMarkSweepGC
8:-XX:+UseGCLogFileRotation
8:-XX:GCLogFileSize=64m
8:-XX:NumberOfGCLogFiles=32

However, with the overrides, it produces:

# This file is managed by Puppet -- es
#
# Set the 'jvm_options' parameter on the elasticsearch class to change this file.

-Dfile.encoding=UTF-8
-Dio.netty.noKeySetOptimization=true
-Dio.netty.noUnsafe=true
-Dio.netty.recycler.maxCapacityPerThread=0
-Djava.awt.headless=true
-Djna.nosys=true
-Dlog4j.shutdownHookEnabled=false
-Dlog4j2.disable.jmx=true
-XX:+AlwaysPreTouch
-XX:+HeapDumpOnOutOfMemoryError
-XX:+PrintGCDateStamps
-XX:+PrintGCDetails
-XX:+PrintTenuringDistribution
-XX:+UseCMSInitiatingOccupancyOnly
-XX:+UseConcMarkSweepGC
-XX:+UseGCLogFileRotation
-XX:-OmitStackTraceInFastThrow
-XX:CMSInitiatingOccupancyFraction=75
-XX:GCLogFileSize=64m
-XX:NumberOfGCLogFiles=32
-Xloggc:/var/log/elasticsearch/es/gc.log
-Xms2024m
-Xmx2024m
-Xss1m
-server

This is occurring due to the defaults that exist in the ES module. Essentially, unless you manually specify the correct option here, it will automatically add a broken option for you. That means changing the options in the Chassis ES manifest won't help; you need to explicitly set them somewhere. Great usability from puppet-elasticsearch there 👍 (See also voxpupuli/puppet-elasticsearch#1032)

Setting all the settings manually will likely work; as @BronsonQuick notes though, you also need to restart the machine if you're bumping the machine's RAM settings. Applying the changes causes ES to fail starting otherwise, but with a different error that you can miss if you're not watching closely:

Apr 08 09:55:41 altis elasticsearch[1849]: OpenJDK 64-Bit Server VM warning: INFO: os::commit_memory(0x0000000081800000, 2122317824, 0) failed; error='Not enough space' (errno=12)                
Apr 08 09:55:41 altis elasticsearch[1849]: #                                                  
Apr 08 09:55:41 altis elasticsearch[1849]: # There is insufficient memory for the Java Runtime Environment to continue.
Apr 08 09:55:41 altis elasticsearch[1849]: # Native memory allocation (mmap) failed to map 2122317824 bytes for committing reserved memory.  

After rebooting the machine, vagrant provision works and ES does too.


So, to summarise:

  • If you're override any JDK options, you must include the defaults as well, or puppet-elasticsearch will add broken ones automatically.
  • To bump memory in ES, set the VM's memory high enough to allow that and ensure you have restarted the machine after changing this. A rule of thumb is 2x ES' memory.

The following configuration should work:

elasticsearch:
    jvm_options:
        # Bump memory for ES to 2GB
        - "-Xms2024m"
        - "-Xmx2024m"
        # Include defaults for ES
        - "8:-XX:NumberOfGCLogFiles=32"
        - "8:-XX:GCLogFileSize=64m"
        - "8:-XX:+UseGCLogFileRotation"
        - "8:-XX:+PrintTenuringDistribution"
        - "8:-XX:+PrintGCDateStamps"
        - "8:-XX:+PrintGCApplicationStoppedTime"
        - "8:-XX:+UseConcMarkSweepGC"
        - "8:-XX:+UseCMSInitiatingOccupancyOnly"
        - "-XX:+UseG1GC"
        - "11:-XX:InitiatingHeapOccupancyPercent=75"

# Bump VM memory as well
virtualbox:
    memory: 4048
    cpus: 2

We should document this.

@roborourke
Copy link
Contributor

Ah I see regarding the deep_merge, it replaces the array 🤦‍♂ - I think I'll update this to add those default settings after merging the settings to avoid that issue.

@rmccue
Copy link
Contributor

rmccue commented Apr 8, 2020

For testing btw, here's the key things you need:

  1. When updating the config, ensure you reboot the machine if changing virtualbox settings
  2. Wipe Elasticsearch before reprovisioning, as files are not always cleaned up otherwise; specifically: sudo apt-get remove elasticsearch, rm -rf /etc/elasticsearch/ /usr/share/elasticsearch/ /etc/default/elasticsearch /etc/default/elasticsearch-es
  3. Check the generated files at /etc/elasticsearch/es/jvm.options, not just the inputs
  4. Read the ES logs directly via journalctl -u elasticsearch-es.service as Puppet doesn't output all of them

If you miss any of these steps, you can run into false positives and negatives super easy.

@roborourke
Copy link
Contributor

roborourke commented Apr 8, 2020

@BronsonQuick @rmccue what are your thoughts on providing the full default jvm_options list but also having a new config option to set the memory limit and use that to generate the -Xmx and Xms values?

@rmccue
Copy link
Contributor

rmccue commented Apr 8, 2020

@roborourke You read my mind, I think that's the best option here.

@mikeselander
Copy link
Author

Ah, had not considered the memory issue! Thanks gents.

roborourke added a commit that referenced this issue Apr 8, 2020
Fixes #22

Right now using `deep_merge` results in the JVM options array being overwritten completely. The default options added since updating this module to use elasticstack and Java 11 result in the service not being able to start depending on the version of Elasticsearch selected.

This ensures that the default JVM options are scoped to the Java versions they work with and that any overrides do not cause these safe defaults to be overwritten completely unless it is intentional.

The update is backwards compatible with installations that set the `-Xmx` and `-Xms` JVM options using the legacy method and adds a new simpler option `memory` that will set these options for you.
@roborourke
Copy link
Contributor

Ok, created #23 to fix this finally - it ensures the defaults we added are always there but still allows overriding via jvm_options even if all of them aren't present.

It's backwards compatible and also adds a new memory option to just set an integer and not faff with JVM options at all.

Would appreciate it if you have a chance to test it out @mikeselander

I think another good improvement would be to try and warn or throw an error if the VM memory is not sufficient, would require parsing the size out of the -Xmx and -Xms values though which is a bit fraught as you can use values like 2g or 512m etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants