You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There are a total of 1433 frames in DISP-S1 historical database json that we must process. If we put all those frames into a single batch_proc, ES calls fail with the error message illegal_argument_exception', 'Limit of total fields [1000] has been exceeded'
This happens because by default ES indices can have up to 1000 fields including subfields. The error happens not when the batch_proc is created but when the batch application updates the batch_proc with the state information. The state information is a large dictionary of {frame: state} and the keys of this dict are being mapped as subfields in ES. There are couple ways we can fix this:
Increase that maximum field length to be 2000 for the batch_proc index only. This does work and is limited to just this one index which will not have much data. The batch app can make this change when it first runs. Very light and seamless. curl -s -XPUT http://grq:9200/batch_proc/_settings -H 'Content-Type: application/json' -d '{"index.mapping.total_fields.limit": 2000}'
This could be considered ever so slightly risky because the field limit is there for a reason. But this change would only apply to this one index that will never contain any significant amount of data.
If we deem option 1 to be risky, the second option is to change the data structure being used to store the frame state information to something other than a map. It can just be a list, two lists (keys and values), one big string, etc. We can still transform it into a map in the application and it's a small trivial operation so there's no downside in performance. The downside of this option is that it will impact the code in nontrivial way which requires deeper testing and time. Also, the view of the data using pcm_batch.py tool will not be as friendly to the operator.
My recommendation is option 1.
What did you expect?
nt
Reproducible steps
1.
2.
3.
...
Environment
- Version of this software [e.g. vX.Y.Z]
- Operating System: [e.g. MacOSX with Docker Desktop vX.Y]
...
The text was updated successfully, but these errors were encountered:
Checked for duplicates
Yes - I've already checked
Describe the bug
There are a total of 1433 frames in DISP-S1 historical database json that we must process. If we put all those frames into a single batch_proc, ES calls fail with the error message
illegal_argument_exception', 'Limit of total fields [1000] has been exceeded'
This happens because by default ES indices can have up to 1000 fields including subfields. The error happens not when the batch_proc is created but when the batch application updates the batch_proc with the state information. The state information is a large dictionary of {frame: state} and the keys of this dict are being mapped as subfields in ES. There are couple ways we can fix this:
Increase that maximum field length to be 2000 for the batch_proc index only. This does work and is limited to just this one index which will not have much data. The batch app can make this change when it first runs. Very light and seamless.
curl -s -XPUT http://grq:9200/batch_proc/_settings -H 'Content-Type: application/json' -d '{"index.mapping.total_fields.limit": 2000}'
This could be considered ever so slightly risky because the field limit is there for a reason. But this change would only apply to this one index that will never contain any significant amount of data.
If we deem option 1 to be risky, the second option is to change the data structure being used to store the frame state information to something other than a map. It can just be a list, two lists (keys and values), one big string, etc. We can still transform it into a map in the application and it's a small trivial operation so there's no downside in performance. The downside of this option is that it will impact the code in nontrivial way which requires deeper testing and time. Also, the view of the data using
pcm_batch.py
tool will not be as friendly to the operator.My recommendation is option 1.
What did you expect?
nt
Reproducible steps
Environment
The text was updated successfully, but these errors were encountered: