From 17790909fc224271750d159b9159090fc3a69afb Mon Sep 17 00:00:00 2001 From: annasojkapal <107477565+annasojkapal@users.noreply.github.com> Date: Thu, 19 Oct 2023 16:52:40 +0200 Subject: [PATCH] REL-878227 long text fields encoding configuration (#18) * REL-878227 long text fields encoding configuration --- README.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 11a8ca0..867f460 100644 --- a/README.md +++ b/README.md @@ -1139,7 +1139,7 @@ List of samples: ## Import Job Settings ### Encoding -For improved performance when dealing with fileshare data on ADLS, we highly recommend using extracted text or other long text files encoded in UTF-16. By doing so, you can avoid the need for conversion to the correct encoding, leading to significant time savings in your document and image workflows. +For best performance, we highly recommend using UTF-16 encoding for any single long text field (including Extracted text). Other encodings are still supported, but will be converted to UTF-16 which will add delay to document or image import process. For the document workflow, set **FieldMapping.Encoding** to UTF-16. Similarly, for the image workflow, configure **ImageSettings.ExtractedTextEncoding** as UTF-16. With these settings in place, the conversion overhead is eliminated, and your files will be copied directly in the unicode encoding, resulting in faster processing times. @@ -1169,6 +1169,7 @@ For the document workflow, set **FieldMapping.Encoding** to UTF-16. Similarly, f .WithoutFieldsMapped() .WithoutFolders(); +If your mapping contains more than one long text field, you should use UTF-16. No other encodings are supported in this case. ### FileSizeColumnIndex Another valuable setting that can enhance performance is the **FieldMapping.FileSizeColumnIndex**. By configuring this setting, the need for additional file size calculations can be eliminated. The file sizes will be automatically extracted from the load file, streamlining the process and saving valuable processing time.