Skip to content

Improve performance on metadata computation #12785

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 24 commits into from
Jun 23, 2025

Conversation

charlesBochet
Copy link
Member

@charlesBochet charlesBochet commented Jun 23, 2025

In this PR:

Improve recompute metadata cache performance. We are aiming for ~100ms

Deleting relationMetadata table and FKs pointing on it
Fetching indexMetadata and indexFieldMetadata in a separate query as typeorm is suboptimizing

Remove caching lock

As recomputing the metadata cache is lighter, we try to stop preventing multiple concurrent computations. This also simplifies interfaces

Introduce self recovery mecanisms to recompute cache automatically if corrupted

Aka getFreshObjectMetadataMaps

custom object resolver performance improvement: 1sec to 200ms

Double check queries and indexes used while creating a custom object
Remove the queries to db to use the cached objectMetadataMap

reduce objectMetadataMaps to 500kb

image

We used to stored 3 fieldMetadataMaps (byId, byName, byJoinColumnName). While this is great for devXP, this is not great for performances.
Using the same mecanisme as for objectMetadataMap: we only keep byIdMap and introduce two otherMaps to idByName, idByJoinColumnName to make the bridge

Add dataloader on IndexMetadata (aka indexMetadataList in the API)

Improve field resolver performances too

Deprecate ClientConfig

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Summary

Major refactoring to improve metadata computation performance by removing the caching lock mechanism and simplifying cache operations across the application.

  • Removed ignoreLock parameter and associated locking logic from all cache recomputation methods, suggesting a shift towards a more efficient caching strategy without manual locks
  • Added getFreshObjectMetadataMaps as the main entry point for metadata retrieval in WorkspaceMetadataCacheService
  • Introduced composite index on workspaceId and createdAt columns in DataSourceEntity to optimize query performance
  • Removed relationMetadata table and its constraints through migration, though migration strategy needs improvement
  • Enabled comprehensive TypeORM logging in core datasource, which could impact production performance

23 files reviewed, 6 comments
Edit PR Review Bot Settings | Greptile

Comment on lines 157 to 162
const { objectMetadataMaps: cachedObjectMetadataMaps } =
await this.workspaceMetadataCacheService.getFreshObjectMetadataMaps(
{
workspaceId,
},
);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: this could create unnecessary recomputation overhead since getFreshObjectMetadataMaps() is not using cached values. Consider using getObjectMetadataMaps() first and only call getFreshObjectMetadataMaps() if cache is missing

Comment on lines +247 to +248
async flush(workspaceId: string, metadataVersion?: number): Promise<void> {
await this.flushVersionedMetadata(workspaceId, metadataVersion);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: consider awaiting all cache deletions in parallel using Promise.all for better performance

Comment on lines 109 to 113
removeUserWorkspaceRoleMap(workspaceId: string) {
return this.cacheStorageService.del(
`${WorkspaceCacheKeys.MetadataPermissionsUserWorkspaceRoleMap}:${workspaceId}`,
);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: Should also remove the version when removing the map to prevent stale version references

Copy link
Contributor

github-actions bot commented Jun 23, 2025

🚀 Preview Environment Ready!

Your preview environment is available at: http://bore.pub:27435

This environment will automatically shut down when the PR is closed or after 5 hours.

await queryRunner.query(
`CREATE INDEX "IDX_DATA_SOURCE_WORKSPACE_ID_CREATED_AT" ON "core"."dataSource" ("workspaceId", "createdAt") `,
);
await queryRunner.query(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also removing this

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why ?

Copy link
Member

@Weiko Weiko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • "also introduce self recovery mecanisms to recompute cache automatically if corrupted" I don't see it? You mean you do that by comparing versions in cache VS DB? Or did you introduce something new?
  • Do we really want to remove ignoreLock? What about race conditions?
    It seems we were already using ignoreLock: true almost everywhere so I guess it's fine
  • CC @ijreilly for permissions change :)

@@ -89,6 +94,11 @@ export class ObjectMetadataService extends TypeOrmQueryService<ObjectMetadataEnt
override async createOne(
objectMetadataInput: CreateObjectInput,
): Promise<ObjectMetadataEntity> {
const { objectMetadataMaps } =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool!

@@ -10,6 +11,7 @@ export const buildDefaultFieldsForCustomObject = (
workspaceId: string,
): Partial<FieldMetadataEntity>[] => [
{
id: v4(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because I want to avoid making multiple calls to db if not needed

@charlesBochet
Copy link
Member Author

  • "also introduce self recovery mecanisms to recompute cache automatically if corrupted" I don't see it? You mean you do that by comparing versions in cache VS DB? Or did you introduce something new?
  • Do we really want to remove ignoreLock? What about race conditions?
    It seems we were already using ignoreLock: true almost everywhere so I guess it's fine
  • CC @ijreilly for permissions change :)

I've introduced 'getFreshMetadata'

Copy link
Member

@Weiko Weiko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

await queryRunner.query(
`CREATE INDEX "IDX_DATA_SOURCE_WORKSPACE_ID_CREATED_AT" ON "core"."dataSource" ("workspaceId", "createdAt") `,
);
await queryRunner.query(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why ?

Copy link
Contributor

📊 API Changes Report

REST API Changes

Summary

🔄 Changed Operations (63)

  • /apiKeys/duplicates: Modified operation
  • /attachments/duplicates: Modified operation
  • /blocklists/duplicates: Modified operation
  • /calendarChannelEventAssociations/duplicates: Modified operation
  • /calendarChannels/duplicates: Modified operation
  • /calendarEventParticipants/duplicates: Modified operation
  • /calendarEvents/duplicates: Modified operation
  • /companies/duplicates: Modified operation
  • /connectedAccounts/duplicates: Modified operation
  • /favoriteFolders/duplicates: Modified operation
  • /favorites/duplicates: Modified operation
  • /messageChannelMessageAssociations/duplicates: Modified operation
  • /messageChannels/duplicates: Modified operation
  • /messageFolders/duplicates: Modified operation
  • /messageParticipants/duplicates: Modified operation
  • /messages/duplicates: Modified operation
  • /messageThreads/duplicates: Modified operation
  • /notes/duplicates: Modified operation
  • /noteTargets/duplicates: Modified operation
  • /opportunities/duplicates: Modified operation
  • /people/duplicates: Modified operation
  • /pets/duplicates: Modified operation
  • /rockets/duplicates: Modified operation
  • /surveyResults/duplicates: Modified operation
  • /tasks/duplicates: Modified operation
  • /taskTargets/duplicates: Modified operation
  • /timelineActivities/duplicates: Modified operation
  • /viewFields/duplicates: Modified operation
  • /viewFilterGroups/duplicates: Modified operation
  • /viewFilters/duplicates: Modified operation
  • /viewGroups/duplicates: Modified operation
  • /views/duplicates: Modified operation
  • /viewSorts/duplicates: Modified operation
  • /webhooks/duplicates: Modified operation
  • /workflowAutomatedTriggers/duplicates: Modified operation
  • /workflowRuns/duplicates: Modified operation
  • /workflows/duplicates: Modified operation
  • /workflowVersions/duplicates: Modified operation
  • /workspaceMembers/duplicates: Modified operation
  • /apiKeys: Modified operation
  • /apiKeys/{id}: Modified operation
  • /calendarEvents: Modified operation
  • /calendarEvents/{id}: Modified operation
  • /companies: Modified operation
  • /companies/{id}: Modified operation
  • /opportunities: Modified operation
  • /opportunities/{id}: Modified operation
  • /people: Modified operation
  • /people/{id}: Modified operation
  • /pets: Modified operation
  • /pets/{id}: Modified operation
  • /viewFields: Modified operation
  • /viewFields/{id}: Modified operation
  • /viewFilters: Modified operation
  • /viewFilters/{id}: Modified operation
  • /viewGroups: Modified operation
  • /viewGroups/{id}: Modified operation
  • /views: Modified operation
  • /views/{id}: Modified operation
  • /viewSorts: Modified operation
  • /viewSorts/{id}: Modified operation
  • /workspaceMembers: Modified operation
  • /workspaceMembers/{id}: Modified operation

⚠️ Please review these API changes carefully before merging.

⚠️ Breaking Change Protocol

Breaking changes detected but PR title does not contain "breaking" - CI will pass but action needed.

🔄 Options:

  1. If this IS a breaking change: Add "breaking" to your PR title and add BREAKING CHANGE: to your commit message
  2. If this is NOT a breaking change: The API diff tool may have false positives - please review carefully

For breaking changes, add to commit message:

feat: add new API endpoint

BREAKING CHANGE: removed deprecated field from User schema

@charlesBochet charlesBochet merged commit d5c9740 into main Jun 23, 2025
57 checks passed
@charlesBochet charlesBochet deleted the improve-performance-data-model branch June 23, 2025 19:06
Copy link

sentry-io bot commented Jun 23, 2025

Suspect Issues

This pull request was deployed and Sentry observed the following issues:

Did you find this useful? React with a 👍 or 👎

etiennejouan added a commit that referenced this pull request Jun 27, 2025
…n front (#12886)

Context : 
- IndexFieldMetadata was no longer available on 'objects' gql query
([since this PR](#12785)). Then,
unicity checks on import do not work anymore.

Fix : 
- Add a dataloader logic in indexFieldMetadata
- Add extra check in unicity hook on import
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants