The MongoDB data transfer extension provides source and sink capabilities for reading from and writing to a MongoDB database.
Note: When specifying the MongoDB extension as the Source or Sink property in configuration, utilize the name MongoDB.
Source and sink settings require both ConnectionString
and DatabaseName
parameters. The source takes an optional Collection
parameter (if this parameter is not set, it will read from all collections). The sink requires the Collection
parameter and will insert all records received from a source into that collection, as well as an optional BatchSize
parameter (default value is 100) to batch the writes into the collection.
{
"ConnectionString": "",
"DatabaseName": "",
"Collection": ""
}
{
"ConnectionString": "",
"DatabaseName": "",
"Collection": ""
}
The MongoDB Vector extension is a Sink only extension that builds on the MongoDB extension by providing additional capabilities for generating embeddings using Azure OpenAI APIs.
Note: When specifying the MongoDB Vector extension as the Sink property in configuration, utilize the name MongoDB-Vector(beta).
If using CSFLE (Client Side Field Level Encryption), source sink supports autodecryption providing the following parameters:
KeyVaultNamespace
: Database and collection holding the Key Vault and Keys details. Format:database.collection
KMSProviders
: Key Management Service providers for the Key Vault. For Azure Key Vault support, the following parameters are required:tenantId
: The Azure Active Directory tenant IDclientId
: The Azure Active Directory application client IDclientSecret
: The Azure Active Directory application client secret
{
"ConnectionString": "",
"DatabaseName": "",
"Collection": "",
"KeyVaultNamespace": "",
"KMSProviders": {
"azure": {
"tenantId": "",
"clientId": "",
"clientSecret": ""
}
}
}
The settings are based on the MongoDB extension settings with additional parameters for generating embeddings.
The sink settings require the following additional parameters:
GenerateEmbedding
: If set to true, the sink will generate embeddings for the records before writing them to the database. The sink requires theOpenAIUrl
,OpenAIKey
, andOpenAIDeploymentModel
parameters to be set. Following paramaters are required if this is trueOpenAIUrl
: The URL of the OpenAI APIOpenAIKey
: The API key for the OpenAI APIOpenAIDeploymentName
: The deployment model to use for the OpenAI APISourcePropEmbedding
: The property in the source data that should be used to generate the embeddingsDestPropEmbedding
: New property name that will be added to the source data with the generated embeddings
{
"ConnectionString": "",
"DatabaseName": "",
"Collection": "",
"BatchSize: 100,
"GenerateEmbedding": true | false
"OpenAIUrl": "",
"OpenAIKey": "",
"OpenAIDeploymentModel": "",
"SourcePropEmbedding": "",
"DestPropEmbedding": ""
}