-
Notifications
You must be signed in to change notification settings - Fork 63
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Preserve ctf_{sequence,array}_text string data even if they contain NULL characters #114
base: stable-2.0
Are you sure you want to change the base?
Conversation
Hi @Kerilk! I'm sorry but the Babeltrace 1.x release series won't see new releases beyond bug fixes. |
This change is for Babeltrace 2.0. |
Indeed, we need to decide whether we want to reinstate this bt1 behavior into bt2 or not. |
To elaborate a bit, we use it to save blobs of memory that we can easily cast back to structured types. |
Why can't you use ctf_sequence or ctf_array for this ? The *_text variants are meant to trace zero-terminated strings. |
Because I can't get a pointer back to the data, as far as I can tell, without iterating over all the elements of the array and making an additional copy. Which is painful and slow, especially if you do it from high level languages (ruby, python, etc). |
Thanks for the added clarifications and sorry for closing the issue earlier; I completely misinterpreted the description. I understand the overall use-case for accessing "raw" binary payloads directly. The proposed change doesn't work as it violates a precondition of A check for this is performed here: This check is only performed in "developer mode" for performance reasons, which I guess is why you didn't hit it. I recommend you configure your
Anyhow, this would be an abuse of the API that I would rather not encourage even if it did work. I have discussed this with @eepp and, so far, we both think the inclusion of a new "blob" field class would fit best in the current API. This field class would be used when a source identifies a field as an "unstructured" binary payload and would provide a mean to access the data directly (i.e. I still have to give this some thought to flesh out the implications and I am open to suggestions. To understand your use case a bit better:
I'm trying to figure how the Python bindings can expose this in a natural and efficient way and see how "smart" this has to be to provide an acceptable performance level. Also, are the ruby bindings you are using publicly available? Thanks! |
Sorry for the lengthy response. General RemarksI think your proposed solution is the best one. I can still use the modified ctf plugin in the meantime. Adopting the new feature will break compatibility for our babeltrace1 based tools, but I can build them back around babeltrace2. Thanks for the tip for the Use CaseOur use case is the model-centric tracing of Heterogeneous API like OpenCL, CUDA or Level Zero (see bottom for more details). The idea is to dump not only the arguments of the API calls but also the data behind pointers as well. We try to stay reasonable, we use file IOs to dump larger objects like buffers or compiled programs, and use LTTng to dump the path. We tried LTTng events to dump those fields, but we never managed to get the daemon not to drop messages when message size neared the GiB. This was out of curiosity.
About bindingsThe ruby binding I am using are for babeltrace1 and can be found here: The most efficient most efficient way for me to map structures in ruby (this applies to python as well) is to wrap the pointer returned by babeltrace into an FFI Pointer and use this pointer as backing for an FFI struct. Here is an extract of my babeltrace_ze tool that reads level zero LTTng traces (ZEDevicegetProperties):
In ruby the code (generated) looks like: class ZEDeviceProperties < FFI::ZEStruct
layout :stype, :ze_structure_type_t,
:pNext, :pointer,
:type, :ze_device_type_t,
:vendorId, :uint32_t,
:deviceId, :uint32_t,
:flags, :ze_device_property_flags_t,
:subdeviceId, :uint32_t,
:coreClockRate, :uint32_t,
:maxMemAllocSize, :uint64_t,
:maxHardwareContexts, :uint32_t,
:maxCommandQueuePriority, :uint32_t,
:numThreadsPerEU, :uint32_t,
:physicalEUSimdWidth, :uint32_t,
:numEUsPerSubslice, :uint32_t,
:numSubslicesPerSlice, :uint32_t,
:numSlices, :uint32_t,
:timerResolution, :uint64_t,
:timestampValidBits, :uint32_t,
:kernelTimestampValidBits, :uint32_t,
:uuid, :ze_device_uuid_t,
:name, [ :char, 256 ]
end
$event_lambdas["lttng_ust_ze:zeDeviceGetProperties_stop"] = lambda { |defi|
s = "{ "
s << "zeResult: #{ZE::ZEResult.from_native(defi["zeResult"], nil)}"
s << ', '
s << "pDeviceProperties_val: #{defi["pDeviceProperties_val"].size > 0 ? ZE::ZEDeviceProperties.new(FFI::MemoryPointer.from_string(defi["pDeviceProperties_val"])) : nil}"
s << " }"
} Note that with the proposed feature I would not need to get the pointer back from the string, but could directly map the struct on the returned pointer. The same approach is used in C to build a babeltrace2 event dispatcher that allows introspection into the API structs. Here is an example of using ctypes in python to achieve the same kind of results https://xgitlab.cels.anl.gov/videau/cconfigspace/-/blob/master/bindings/python/cconfigspace/base.py#L260-273 (note that unions are somewhat broken in ctypes). THAPIThe whole THAPI tracing project is hosted here: |
Thanks a lot for the detailed answer.
Hmm, Babeltrace would have to be a bit smarter to accommodate payloads of that size; copying them into the For the moment, I guess even a naive copy approach would help get you going. As for LTTng, the tracers will not save event payloads that are larger than a sub-buffer as payloads cannot span more than one sub-buffer. LTTng isn't really tuned for those kind of payload sizes, but if you make sure to configure sub-buffers to be larger than your expected payload, it should work. See https://lttng.org/docs/#doc-channel-subbuf-size-vs-subbuf-count. You may also want to have a look at lttng-ust's blocking mode (https://lttng.org/docs/#doc-blocking-timeout-example) to leave the consumer daemon enough time to extract those payloads to disk.
Great! Looking forward to it 😃 I will have a look at THAPI, it sounds pretty interesting. |
Follow-up: the recent CTF 2 specification proposal revision includes static-length and dynamic-length BLOB field classes. If this is accepted, it means you'll be able to use such a field class in CTF 2 instead of static-length and dynamic-length array field classes to describe BLOB fields. A CTF 2 BLOB field class also has an associated IANA media type. In Babeltrace 2:
Will this satisfy your use cases @Kerilk? |
Yes it will, thanks a lot for making this proposal. On my side I am almost done binding Babeltrace 2 in Ruby. I will send a message to the mailing list with any problems I have encountered when I am done, as none of the issues I have found until now need urgent fixes. |
…to the resulting string and not stop at the first '\0' character.
Hello @eepp, Any news on the the CTF2 adoption? |
This patch restores the behavior of babeltrace2 to that of babeltrace1.5 regarding the handling of ctf_sequence_text and ctf_array_text. The original bytes/length provided during tracing are reflected in the resulting string irrespective of it containing
'\0'
characters.If the project is willing to accept the pull request I could devise a test to ensure this behavior is preserved in the future.