Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(Gong) - Add transcripts sync #11231

Merged
merged 46 commits into from
Mar 5, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
46 commits
Select commit Hold shift + click to select a range
f200516
add method getCallsMetadata
aubin-tchoi Mar 5, 2025
eba06fb
add incomplete TranscriptModel
aubin-tchoi Mar 5, 2025
22e499d
add callId to TranscriptModel
aubin-tchoi Mar 5, 2025
4b47f55
add function to upsert transcript
aubin-tchoi Mar 5, 2025
0e87e71
add the transcript upserts
aubin-tchoi Mar 5, 2025
c88eed7
add missing fields from gong_transcripts table
aubin-tchoi Mar 5, 2025
a4ddebe
add the email addresses to the tags
aubin-tchoi Mar 5, 2025
7c11ea7
add GongTranscriptModel to the synced models
aubin-tchoi Mar 5, 2025
3b64095
add index on connectorId, callId
aubin-tchoi Mar 5, 2025
fdfefa9
add migration file
aubin-tchoi Mar 5, 2025
8e02302
add a top level folder with all the transcripts
aubin-tchoi Mar 5, 2025
a7cad16
separate the two migration files
aubin-tchoi Mar 5, 2025
8fa2e93
lint
aubin-tchoi Mar 5, 2025
7dc7b29
lint
aubin-tchoi Mar 5, 2025
7c54fe7
prevent selecting the Gong transcripts folder
aubin-tchoi Mar 5, 2025
d8b5f76
remove the empty function shouldSyncTranscript (we're not doing it)
aubin-tchoi Mar 5, 2025
38c5709
rename speakers into participants
aubin-tchoi Mar 5, 2025
36e025f
remove unnecessary fields from GongParticipantCodec
aubin-tchoi Mar 5, 2025
6af1b4e
increment migration file index
aubin-tchoi Mar 5, 2025
73753b8
Merge branch 'main' into gong-transcripts
aubin-tchoi Mar 5, 2025
bb9a7f9
add todo
aubin-tchoi Mar 5, 2025
6cfb19c
add participant metadata
aubin-tchoi Mar 5, 2025
bd65280
lint
aubin-tchoi Mar 5, 2025
35551e2
update limitations warning
aubin-tchoi Mar 5, 2025
8cccc34
fix document content structure
aubin-tchoi Mar 5, 2025
6055163
fix how tags are preprocessed
aubin-tchoi Mar 5, 2025
f3b2630
prevent redunding the tag name in the prefix
aubin-tchoi Mar 5, 2025
87c1df6
exhaust the cursor when getting transcript metadata
aubin-tchoi Mar 5, 2025
56eb911
delete users and transcripts on strategy delete
aubin-tchoi Mar 5, 2025
25aeced
fix GongParticipantCodec, making userId not nullable
aubin-tchoi Mar 5, 2025
6546107
remove unnecessary index
aubin-tchoi Mar 5, 2025
6a62ad9
add variable TRANSCRIPTS_FOLDER_TITLE
aubin-tchoi Mar 5, 2025
84589da
extract variables for the document content
aubin-tchoi Mar 5, 2025
3d31fea
add missing fields from GongTranscriptResource.toJSON
aubin-tchoi Mar 5, 2025
1175fce
remove unused methid
aubin-tchoi Mar 5, 2025
4facc2f
prevent refetches of the OAuth token
aubin-tchoi Mar 5, 2025
c6392a9
refactor: move getGongAccessToken into utils
aubin-tchoi Mar 5, 2025
a3ceb8a
Revert "refactor: move getGongAccessToken into utils"
aubin-tchoi Mar 5, 2025
203ba83
Revert "prevent refetches of the OAuth token"
aubin-tchoi Mar 5, 2025
7402be7
add the speaker's email provided in the call/extensive endpoint as a …
aubin-tchoi Mar 5, 2025
fa2e424
fix codec for gong participants
aubin-tchoi Mar 5, 2025
f643f33
add comment
aubin-tchoi Mar 5, 2025
18a2e9d
add a comment
aubin-tchoi Mar 5, 2025
b061f9e
add another comment
aubin-tchoi Mar 5, 2025
e5abd3c
remove unused type
aubin-tchoi Mar 5, 2025
c2fcece
add the direction to the tags
aubin-tchoi Mar 5, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 13 additions & 0 deletions connectors/migrations/db/migration_60.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
-- Migration created on Mar 05, 2025
CREATE TABLE IF NOT EXISTS "gong_transcripts"
(
"createdAt" TIMESTAMP WITH TIME ZONE NOT NULL,
"updatedAt" TIMESTAMP WITH TIME ZONE NOT NULL,
"callId" TEXT NOT NULL,
"title" TEXT NOT NULL,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can title be bigger than 255 characters?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't have occurrences in our gong but I would assume it's not really bounded

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's truncate in the resource then.

"url" TEXT NOT NULL,
"connectorId" BIGINT NOT NULL REFERENCES "connectors" ("id") ON DELETE RESTRICT ON UPDATE CASCADE,
"id" BIGSERIAL,
PRIMARY KEY ("id")
);
CREATE UNIQUE INDEX "gong_transcripts_connector_id_call_id" ON "gong_transcripts" ("connectorId", "callId");
2 changes: 2 additions & 0 deletions connectors/src/admin/db.ts
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ import {
} from "@connectors/lib/models/github";
import {
GongConfigurationModel,
GongTranscriptModel,
GongUserModel,
} from "@connectors/lib/models/gong";
import {
Expand Down Expand Up @@ -136,6 +137,7 @@ async function main(): Promise<void> {
await ZendeskTicketModel.sync({ alter: true });
await SalesforceConfigurationModel.sync({ alter: true });
await GongConfigurationModel.sync({ alter: true });
await GongTranscriptModel.sync({ alter: true });
await GongUserModel.sync({ alter: true });

// enable the `unaccent` extension
Expand Down
17 changes: 16 additions & 1 deletion connectors/src/connectors/gong/index.ts
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
import type { ContentNode, Result } from "@dust-tt/types";
import { Err, Ok } from "@dust-tt/types";
import { Err, MIME_TYPES, Ok } from "@dust-tt/types";

import { makeGongTranscriptFolderInternalId } from "@connectors/connectors/gong/lib/internal_ids";
import {
launchGongSyncWorkflow,
stopGongSyncWorkflow,
Expand All @@ -12,11 +13,15 @@ import type {
UpdateConnectorErrorCode,
} from "@connectors/connectors/interface";
import { BaseConnectorManager } from "@connectors/connectors/interface";
import { dataSourceConfigFromConnector } from "@connectors/lib/api/data_source_config";
import { upsertDataSourceFolder } from "@connectors/lib/data_sources";
import logger from "@connectors/logger/logger";
import { ConnectorResource } from "@connectors/resources/connector_resource";
import { GongConfigurationResource } from "@connectors/resources/gong_resources";
import type { DataSourceConfig } from "@connectors/types/data_source_config";

const TRANSCRIPTS_FOLDER_TITLE = "Transcripts";

export class GongConnectorManager extends BaseConnectorManager<null> {
static async create({
dataSourceConfig,
Expand All @@ -36,6 +41,16 @@ export class GongConnectorManager extends BaseConnectorManager<null> {
{}
);

// Upsert a top-level folder that will contain all the transcripts (non selectable).
await upsertDataSourceFolder({
dataSourceConfig: dataSourceConfigFromConnector(connector),
folderId: makeGongTranscriptFolderInternalId(connector),
parents: [makeGongTranscriptFolderInternalId(connector)],
parentId: null,
title: TRANSCRIPTS_FOLDER_TITLE,
mimeType: MIME_TYPES.GONG.TRANSCRIPT_FOLDER,
});

const result = await launchGongSyncWorkflow(connector);
if (result.isErr()) {
logger.error(
Expand Down
91 changes: 89 additions & 2 deletions connectors/src/connectors/gong/lib/gong_api.ts
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,52 @@ const GongCallTranscriptCodec = t.type({
transcript: t.array(GongTranscriptMonologueCodec),
});

export type GongCallTranscript = t.TypeOf<typeof GongCallTranscriptCodec>;

export const GongParticipantCodec = t.intersection([
t.type({
speakerId: t.union([t.string, t.null]),
userId: t.union([t.string, t.undefined]),
emailAddress: t.union([t.string, t.undefined]),
}),
CatchAllCodec,
]);

const GongTranscriptMetadataCodec = t.intersection([
t.type({
metaData: t.intersection([
t.type({
id: t.string,
url: t.string,
primaryUserId: t.string,
direction: t.union([
t.literal("Inbound"),
t.literal("Outbound"),
t.literal("Conference"),
t.literal("Unknown"),
]),
scope: t.union([
t.literal("Internal"),
t.literal("External"),
t.literal("Unknown"),
]),
started: t.string, // ISO-8601 date (e.g., '2018-02-18T02:30:00-07:00').
duration: t.number, // The duration of the call, in seconds.
title: t.string,
media: t.union([t.literal("Video"), t.literal("Audio")]),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest removing some fields from here and only keeping the one we use to avoid validation error.

language: t.string, // The language codes (as defined by ISO-639-2B): eng, fre, spa, ger, and ita.
}),
CatchAllCodec,
]),
parties: t.array(GongParticipantCodec),
}),
CatchAllCodec,
]);

export type GongTranscriptMetadata = t.TypeOf<
typeof GongTranscriptMetadataCodec
>;

// Generic codec for paginated results from Gong API.
const GongPaginatedResults = <C extends t.Mixed, F extends string>(
fieldName: F,
Expand All @@ -60,7 +106,7 @@ const GongPaginatedResults = <C extends t.Mixed, F extends string>(
records: t.type({
currentPageNumber: t.number,
currentPageSize: t.number,
// Cursor only exists if there are more results.
// The cursor only exists if there are more results.
cursor: t.union([t.string, t.undefined]),
totalRecords: t.number,
}),
Expand Down Expand Up @@ -190,6 +236,7 @@ export class GongClient {
return this.handleResponse(response, endpoint, codec);
}

// https://gong.app.gong.io/settings/api/documentation#post-/v2/calls/transcript
async getTranscripts({
startTimestamp,
pageCursor,
Expand Down Expand Up @@ -217,14 +264,15 @@ export class GongClient {
} catch (err) {
if (err instanceof GongAPIError && err.status === 404) {
return {
pages: [],
transcripts: [],
nextPageCursor: null,
};
}
throw err;
}
}

// https://gong.app.gong.io/settings/api/documentation#get-/v2/users
async getUsers({ pageCursor }: { pageCursor: string | null }) {
try {
const users = await this.getRequest(
Expand Down Expand Up @@ -260,4 +308,43 @@ export class GongClient {
throw err;
}
}

// https://gong.app.gong.io/settings/api/documentation#post-/v2/calls/extensive
async getCallsMetadata({
callIds,
pageCursor = null,
}: {
callIds: string[];
pageCursor?: string | null;
}) {
try {
const callsMetadata = await this.postRequest(
`/calls/extensive`,
{
cursor: pageCursor,
filter: {
callIds,
},
contentSelector: {
exposedFields: {
parties: true,
},
},
},
GongPaginatedResults("calls", GongTranscriptMetadataCodec)
);
return {
callsMetadata: callsMetadata.calls,
nextPageCursor: callsMetadata.records.cursor,
};
} catch (err) {
if (err instanceof GongAPIError && err.status === 404) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should use isNotFoundError here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see you did it in your PR 🙏

return {
callsMetadata: [],
nextPageCursor: null,
};
}
throw err;
}
}
}
14 changes: 14 additions & 0 deletions connectors/src/connectors/gong/lib/internal_ids.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
import type { ConnectorResource } from "@connectors/resources/connector_resource";

export function makeGongTranscriptFolderInternalId(
connector: ConnectorResource
) {
return `gong-transcript-folder-${connector.id}`;
}

export function makeGongTranscriptInternalId(
connector: ConnectorResource,
callId: string
) {
return `gong-transcript-${connector.id}-${callId}`;
}
148 changes: 148 additions & 0 deletions connectors/src/connectors/gong/lib/upserts.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,148 @@
import { MIME_TYPES } from "@dust-tt/types";

import type {
GongCallTranscript,
GongTranscriptMetadata,
} from "@connectors/connectors/gong/lib/gong_api";
import {
makeGongTranscriptFolderInternalId,
makeGongTranscriptInternalId,
} from "@connectors/connectors/gong/lib/internal_ids";
import {
renderDocumentTitleAndContent,
renderMarkdownSection,
upsertDataSourceDocument,
} from "@connectors/lib/data_sources";
import logger from "@connectors/logger/logger";
import type { ConnectorResource } from "@connectors/resources/connector_resource";
import type { GongUserResource } from "@connectors/resources/gong_resources";
import { GongTranscriptResource } from "@connectors/resources/gong_resources";
import type { DataSourceConfig } from "@connectors/types/data_source_config";

/**
* Syncs a transcript in the db and upserts it to the data sources.
*/
export async function syncGongTranscript({
transcript,
transcriptMetadata,
participants,
speakerToEmailMap,
connector,
dataSourceConfig,
loggerArgs,
forceResync,
}: {
transcript: GongCallTranscript;
transcriptMetadata: GongTranscriptMetadata;
participants: GongUserResource[];
speakerToEmailMap: Record<string, string>;
connector: ConnectorResource;
dataSourceConfig: DataSourceConfig;
loggerArgs: Record<string, string | number | null>;
forceResync: boolean;
}) {
const { callId } = transcript;
const createdAtDate = new Date(transcriptMetadata.metaData.started);
const title = transcriptMetadata.metaData.title || "Untitled transcript";
const documentUrl = transcriptMetadata.metaData.url;

const transcriptInDb = await GongTranscriptResource.fetchByCallId(
callId,
connector
);

if (!forceResync && transcriptInDb) {
logger.info(
{
...loggerArgs,
callId,
},
"[Gong] Transcript already up to date, skipping sync."
);
return;
}

if (!transcriptInDb) {
await GongTranscriptResource.makeNew({
blob: {
connectorId: connector.id,
callId,
title,
url: documentUrl,
},
});
}

logger.info(
{
...loggerArgs,
callId,
createdAtDate,
},
"[Gong] Upserting transcript."
);

const hours = Math.floor(transcriptMetadata.metaData.duration / 3600);
const minutes = Math.floor(
(transcriptMetadata.metaData.duration % 3600) / 60
);
const callDuration = `${hours} hours ${minutes < 10 ? "0" + minutes : minutes} minutes`;

let markdownContent = `Meeting title: ${title}\n\nDate: ${createdAtDate.toISOString()}\n\nDuration: ${callDuration}\n\n`;

// Rebuild the transcript content with [User]: [sentence].
transcript.transcript.forEach((monologue) => {
let lastSpeakerId: string | null = null;
monologue.sentences.forEach((sentence) => {
if (monologue.speakerId !== lastSpeakerId) {
markdownContent += `# ${speakerToEmailMap[monologue.speakerId] || "Unknown speaker"}\n`;
lastSpeakerId = monologue.speakerId;
}
markdownContent += `${sentence.text}\n`;
});
});

const renderedContent = await renderMarkdownSection(
dataSourceConfig,
markdownContent
);
const documentContent = await renderDocumentTitleAndContent({
dataSourceConfig,
title,
content: renderedContent,
createdAt: createdAtDate,
additionalPrefixes: {
language: transcriptMetadata.metaData.language,
media: transcriptMetadata.metaData.media,
scope: transcriptMetadata.metaData.scope,
direction: transcriptMetadata.metaData.direction,
participants: participants.map((p) => p.email).join(", ") || "none",
},
});

const documentId = makeGongTranscriptInternalId(connector, callId);

await upsertDataSourceDocument({
dataSourceConfig,
documentId,
documentContent,
documentUrl,
timestampMs: createdAtDate.getTime(),
tags: [
`title:${title}`,
`createdAt:${createdAtDate.getTime()}`,
`language:${transcriptMetadata.metaData.language}`, // The language codes (as defined by ISO-639-2B): eng, fre, spa, ger, and ita.
`media:${transcriptMetadata.metaData.media}`,
`scope:${transcriptMetadata.metaData.scope}`,
`direction:${transcriptMetadata.metaData.direction}`,
...participants.map((p) => p.email),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like this lives out the participant that are not internal to the connected Gong's workspace. Can we still include them?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm in any case these participants are retrieved using getGongUsers, meaning they are exposed by Gong's API so I would say it's fine since they are not local to the meeting and exist somewhere in gong

],
parents: [documentId, makeGongTranscriptFolderInternalId(connector)],
parentId: makeGongTranscriptFolderInternalId(connector),
loggerArgs: { ...loggerArgs, callId },
upsertContext: { sync_type: "batch" },
title,
mimeType: MIME_TYPES.GONG.TRANSCRIPT,
async: true,
});
}
Loading
Loading