Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix puml not syncing Resolve #1617 #2345

Merged
merged 44 commits into from
Aug 27, 2023
Merged

Fix puml not syncing Resolve #1617 #2345

merged 44 commits into from
Aug 27, 2023

Conversation

SPWwj
Copy link
Contributor

@SPWwj SPWwj commented Jul 23, 2023

What is the purpose of this pull request?

  • Documentation update
  • Bug fix
  • Feature addition or enhancement
  • Code maintenance
  • DevOps
  • Improve developer experience
  • Others, please explain:

Overview of changes:
Resolve #1617
By introducing a lock service to wait for asynchronous functions to complete:

// Importing LockManager module from the specified directory
const LockManager = require('../../utils/LockManager');

// Start an async function
const lockId = LockManager.createLock();

// Stop of an async function
LockManager.deleteLock(lockId);

// Wait for all async functions to complete
await LockManager.waitForLockRelease();

Fix the issue where all distinct PUML diagrams were being reprocessed on every hot reload by introducing a hashmap tracker to keep track of the diagrams.

  • The hash key comes from the content and the file path of the PUML diagram.

Anything you'd like to highlight/discuss:
Performance Analysis:

The implementation of the lock service has minimal impact on PUML processing performance.

Testing was conducted on 70 PUML diagrams:

  • Original Configuration (without lock service)
    - o
    image

    • CPU Utilization: 12% to 93%
    • RAM Usage: 60% to 68%
    • Hot Reload CPU Utilization: 12% to 94%
    • Hot Reload RAM Usage: 60% to 69%
      - d
    image
    • Initial CPU Utilization: 12% to 97%
    • Initial RAM Usage: 63% to 78%
    • Hot Reload CPU Utilization: 6% to 94%
    • Hot Reload RAM Usage: 63% to 75%
  • Singleton Implementation
    -o
    image
    Initial Performance Statistics:

    • CPU Utilization: 12% to 95%
    • RAM Usage: 63% to 72%
    • On every single diagram hot reload: RAM shows no change, CPU increases by 6%

    -d
    image

    Initial Performance Statistics:

    • CPU Utilization: 7% to 96%
    • RAM Usage: 63% to 74%
    • On every single diagram hot reload: RAM shows no change, CPU increases by 6%

These results indicate that the lock service does not significantly affect PUML processing performance.
In fact, it greatly reduces the CPU computation on every hot reload, leading to a more efficient processing system.

Testing instructions:

  1. Check out the PR locally.
  2. Create a new MarkBind project using markbind init.
  3. Run markbind serve -o.
  4. Experiment with the sample:
<puml width="300">
@startuml
alice -> bob ++ : hellos
bob -> bob ++ : self call
bob -> bib ++  #005500 : a
bob -> george ** : create
return done
return rc
bob -> george !! : delete
return success
@enduml
</puml>
  1. Edit the sample and wait for the hot reload to see the effect.

Proposed commit message: (wrap lines at 72 characters)
Introduce LockManager for Async Handling and Optimize PUML Diagrams

Introduce the LockManager module to manage the lifecycle of asynchronous
functions, ensuring robust and consistent handling within synchronous
methods.

Utilize the createLock, deleteLock, and waitForLockRelease methods for
starting, stopping, and waiting for async functions respectively.

Code snippet:

const LockManager = require('path_to/LockManager');
const lockId = LockManager.createLock();
LockManager.deleteLock(lockId);
await LockManager.waitForLockRelease();

Fix the issue where all distinct PUML diagrams were being reprocessed on
every hot reload by introducing a hashmap tracker.

  • The hash key comes from the content and the file name of the PUML diagram.
  • The original `processedDiagrams.clear()` has been updated to
    `cleanDiagrams(processedDiagrams)`.

By implementing these changes, the code is made more maintainable,
asynchronous functions are properly managed, and PUML diagram processing
is optimized.


Checklist: ☑️

  • Updated the documentation for feature additions and enhancements
  • Added tests for bug fixes or features
  • Linked all related issues
  • No unrelated changes

@SPWwj SPWwj marked this pull request as draft July 23, 2023 12:05
@ang-zeyu
Copy link
Contributor

The performance hit from execSync is quite significant (2103 site is a good demo ground).

The fix needs to be more involved #1617 (comment).

@SPWwj
Copy link
Contributor Author

SPWwj commented Jul 24, 2023

The performance hit from execSync is quite significant (2103 site is a good demo ground).

The fix needs to be more involved #1617 (comment).

I have an alternative solution using the lock file. Not sure if this could fix the issue.

@SPWwj SPWwj marked this pull request as ready for review July 24, 2023 07:10
@tlylt
Copy link
Contributor

tlylt commented Jul 26, 2023

@SPWwj is there a real need to implement it as a lock file? Can we do it in memory?

Also pls update your PR description

@SPWwj
Copy link
Contributor Author

SPWwj commented Jul 27, 2023

@SPWwj is there a real need to implement it as a lock file? Can we do it in memory?

Also pls update your PR description

Yes, a similar idea would be to use a singleton class instead of files for this lock service.

Another way is to use Vue components.

Alternatively, we could make NodeProcessor.ts asynchronous, but this would require a lot of work.

@tlylt
Copy link
Contributor

tlylt commented Jul 27, 2023

@SPWwj is there a real need to implement it as a lock file? Can we do it in memory?
Also pls update your PR description

Yes, a similar idea would be to use a singleton class instead of files for this lock service.

Another way is to use Vue components.

Alternatively, we could make NodeProcessor.ts asynchronous, but this would require a lot of work.

pros & cons? rationale behind your decision? could you elaborate to aid review plz

@SPWwj
Copy link
Contributor Author

SPWwj commented Jul 27, 2023

pros & cons? rationale behind your decision? could you elaborate to aid review plz

  1. Use a singleton class instead of files for this lock service.

    • Pro: More efficient and faster than IO operations. Easy to implement without affecting the current implementation.
    • Cons: Extra utility class needed for asynchronous operations.
  2. Use Vue components.

    • Pro: Easy to implement.
    • Cons: Results in a larger client-side library. Need to re-implement the component.
  3. Make NodeProcessor.ts asynchronous.

    • Pro: Supports the asynchronous operation for the NodeProcessor pipeline.
    • Cons: Need to re-implement the pipeline. This is similar to a TypeScript migration and can be very tedious.

The first option (singleton) is recommended as it does not affect the current project structure and provides a way to support asynchronous operations.

.gitignore Outdated Show resolved Hide resolved
@tlylt
Copy link
Contributor

tlylt commented Jul 29, 2023

pros & cons? rationale behind your decision? could you elaborate to aid review plz

  1. Use a singleton class instead of files for this lock service.

    • Pro: More efficient and faster than IO operations. Easy to implement without affecting the current implementation.
    • Cons: Extra utility class needed for asynchronous operations.

The first option (singleton) is recommended as it does not affect the current project structure and provides a way to support asynchronous operations.

Sounds like a plan. What about performance (time, and maybe cpu/memory?) before and after introducing this change? How much longer(or shorter or no diff) would it take, especially for sites with a lot of puml diagrams?

@SPWwj
Copy link
Contributor Author

SPWwj commented Jul 30, 2023

Sounds like a plan. What about performance (time, and maybe cpu/memory?) before and after introducing this change? How much longer(or shorter or no diff) would it take, especially for sites with a lot of puml diagrams?

I don't expect much performance difference as the lock service does not change the current implementation. The only thing it does is to wait for all asynchronous operations to complete, much like a join.

The compiling time may slightly increase after applying this lockfile, due to PlantUML is able to finish generating the file, just like in the test case mentioned above.

@tlylt
Copy link
Contributor

tlylt commented Jul 30, 2023

Sounds like a plan. What about performance (time, and maybe cpu/memory?) before and after introducing this change? How much longer(or shorter or no diff) would it take, especially for sites with a lot of puml diagrams?

I don't expect much performance difference if the lock service does not change the current implementation. The only thing it does is to wait for all asynchronous operations to complete, much like a join.

The compiling time may slightly increase after applying this lockfile, due to PlantUML is able to finish generating the file, just like in the test case mentioned above.

Sure, have some numbers when you are done so that we can make a fair judgment 👍

@damithc
Copy link
Contributor

damithc commented Jul 30, 2023

Sure, have some numbers when you are done so that we can make a fair judgment

Good idea. Use a decent number of puml diagram in the test case.

@SPWwj
Copy link
Contributor Author

SPWwj commented Jul 31, 2023

Sure, have some numbers when you are done so that we can make a fair judgment

Good idea. Use a decent number of puml diagram in the test case.

@damithc @tlyltI have updated the PR description to include some test statistics.

@tlylt
Copy link
Contributor

tlylt commented Aug 2, 2023

@SPWwj is this ready for review? There's still a merge conflict

@SPWwj
Copy link
Contributor Author

SPWwj commented Aug 2, 2023

@SPWwj is this ready for review? There's still a merge conflict

Fixed merged conflict

.gitignore Outdated Show resolved Hide resolved
packages/core/src/utils/LockManager.ts Show resolved Hide resolved
packages/core/src/utils/LockManager.ts Show resolved Hide resolved
Copy link
Contributor

@tlylt tlylt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @SPWwj, looks mostly good on my end, thank you for your quick response. Have one more concern as I read through the diagram status tracking code:

  • Could you add in comments to explain the functions? the definition of "dead" and "alive" weren't as clear. I initially thought you meant "stale" and "fresh", but the logic doesn't add up.
  • Could you illustrate the flow of events for the typical cases:
    • on first generation, what happens? (especially the state of processedDiagrams)
    • after edit of a puml file, what happens? (especially the state of processedDiagrams)
    • after edit of a non puml file, what happens? (especially the state of processedDiagrams)

I'm not sure whether processedDiagrams is working as advertised. Clarification from your end would help!

@SPWwj
Copy link
Contributor Author

SPWwj commented Aug 19, 2023

Hey @SPWwj, looks mostly good on my end, thank you for your quick response. Have one more concern as I read through the diagram status tracking code:

  • Could you add in comments to explain the functions? the definition of "dead" and "alive" weren't as clear. I initially thought you meant "stale" and "fresh", but the logic doesn't add up.

  • Could you illustrate the flow of events for the typical cases:

    • on first generation, what happens? (especially the state of processedDiagrams)
    • after edit of a puml file, what happens? (especially the state of processedDiagrams)
    • after edit of a non puml file, what happens? (especially the state of processedDiagrams)

I'm not sure whether processedDiagrams is working as advertised. Clarification from your end would help!

On first generation, all the diagrams will be added to the map, and whenever a diagram is generated,
it will be marked as fresh.
Upon editing a PUML file or a non-PUML file that triggers a hot reload on the site,
the stale diagram will be removed from the map, and the map will be reset to all stale.
If the diagram is inside the map, the generateDiagram function will not regenerate the diagram.
This is to prevent generating the same diagram twice.

@tlylt
Copy link
Contributor

tlylt commented Aug 19, 2023

Upon editing a PUML file or a non-PUML file that triggers a hot reload on the site, the stale diagram will be removed from the map, and the map will be reset to all stale.

The stale diagram may not be removed immediately right?

Example A, B are diagrams

  1. Initial generation
    • processedMap = {A=true, B=true}
  2. Edit A (which creates C as a new diagram)
    • This will trigger a hot reload on the site
      • before regeneration: no stale diagrams, so no diagrams are deleted
      • before regeneration: now set A, B as stale, C is not in the map yet
        • processedMap = {A=false, B=false}
      • during regeneration: A in the map remains stale, B mark as fresh processedMap = {A=false, B=true}
      • during regeneration: C is added to the map and marked as fresh processedMap = {A=false, B=true, C=true}

And only after a further save operation, will A be removed from the map?

Also, those diagrams that are removed from the map, but they are not deleted right?

  • there wasn't an issue previously because we override the same diagram? Now with this new approach we need to perform cleanup to avoid accumulating a ton of diagrams in _site folder?

Finally, have you tried observing the actual values of processedDiagrams during initialization and after edit?
I console log the values and they seem quite strange.
For example, having two images in one page (and no more puml in other pages) by adding the following to an index.md of a newly generated site:

<puml width="300">
@startuml
A -> bob ++ : hellos
bob -> bob ++ : self call
bob -> bib ++  #005500 : a
bob -> george ** : create
return done
return rc
bob -> george !! : delete
return success
@enduml
</puml>

<puml width="300">
@startuml
B -> bob ++ : hellos
bob -> bob ++ : self call
bob -> bib ++  #005500 : a
bob -> george ** : create
return done
return rc
bob -> george !! : delete
return success
@enduml
</puml>

I would assume that processedDiagrams would have both files after markbind serve, but apparently when I console log, it only contains one:
(to illustrate)
image

Then, even though I just delete something unrelated in an index.md, the processedDiagrams values are updated for some weird reason to one true, one false:
image

FYI I added the log here:
image

@SPWwj
Copy link
Contributor Author

SPWwj commented Aug 20, 2023

The stale diagram may not be removed immediately right?

Yes, it will be removed in a future update.

Example A, B are diagrams

  1. Initial generation processedDiagrams {}

    • generateDiagram => processedDiagrams Add A and B to processedDiagrams {A=true, B=true}
  2. Edit A (which creates C as a new diagram)

    • This will trigger a hot reload on the site

      • before regeneration: no stale diagrams, so no diagrams are deleted
        • now set A, B as stale, C is not in the map yet
      • during regeneration: A in the map remains stale, B mark as fresh generateDiagram => {A=false, B=true, C = true}
        • Add C = true
        • Update B = True
        • Unchange A = stale
          3 Edit B (Which will create D as a new diagram)
    • This will trigger a hot reload on the site

      • before regeneration:
        • A will be remove
        • now set B, C as stale, D is not in the map yet
      • during regeneration: A in the map remains stale, B mark as fresh generateDiagram => { B=false, C = true, D = true}
        • Add D = True
        • Update C = True
        • Unchange B = False
          4 Future Edit repeat above steps.

Yes answered above

Also, those diagrams that are removed from the map, but they are not deleted right?

Yes

  • there wasn't an issue previously because we override the same diagram? Now with this new approach we need to perform cleanup to avoid accumulating a ton of diagrams in _site folder?

We can add a feature here to remove diagrams.

@SPWwj
Copy link
Contributor Author

SPWwj commented Aug 20, 2023

Upon editing a PUML file or a non-PUML file that triggers a hot reload on the site, the stale diagram will be removed from the map, and the map will be reset to all stale.

The stale diagram may not be removed immediately right?

Example A, B are diagrams

  1. Initial generation

    • processedMap = {A=true, B=true}
  2. Edit A (which creates C as a new diagram)

    • This will trigger a hot reload on the site

      • before regeneration: no stale diagrams, so no diagrams are deleted

      • before regeneration: now set A, B as stale, C is not in the map yet

        • processedMap = {A=false, B=false}
      • during regeneration: A in the map remains stale, B mark as fresh processedMap = {A=false, B=true}

      • during regeneration: C is added to the map and marked as fresh processedMap = {A=false, B=true, C=true}

And only after a further save operation, will A be removed from the map?

Also, those diagrams that are removed from the map, but they are not deleted right?

  • there wasn't an issue previously because we override the same diagram? Now with this new approach we need to perform cleanup to avoid accumulating a ton of diagrams in _site folder?

Finally, have you tried observing the actual values of processedDiagrams during initialization and after edit? I console log the values and they seem quite strange. For example, having two images in one page (and no more puml in other pages) by adding the following to an index.md of a newly generated site:

<puml width="300">
@startuml
A -> bob ++ : hellos
bob -> bob ++ : self call
bob -> bib ++  #005500 : a
bob -> george ** : create
return done
return rc
bob -> george !! : delete
return success
@enduml
</puml>

<puml width="300">
@startuml
B -> bob ++ : hellos
bob -> bob ++ : self call
bob -> bib ++  #005500 : a
bob -> george ** : create
return done
return rc
bob -> george !! : delete
return success
@enduml
</puml>

I would assume that processedDiagrams would have both files after markbind serve, but apparently when I console log, it only contains one: (to illustrate) image

This is because the diagram is only added to the map before the generation of a new diagram:
image

Then, even though I just delete something unrelated in an index.md, the processedDiagrams values are updated for some weird reason to one true, one false: image

FYI I added the log here: image

This is correct because the console log is placed at the beginning of the function. However, the following code is in charge of preventing the regeneration of the existing diagram:

image

@tlylt
Copy link
Contributor

tlylt commented Aug 21, 2023

This is correct because the console log is placed at the beginning of the function. However, the following code is in charge of preventing the regeneration of the existing diagram:

I see, thank you for that!

I adjusted the comments slightly as well as code (made unlink sync).

For inline use of puml diagrams via e.g.

<puml width="300">
@startuml
B -> bob ++ : hellos
bob -> bob ++ : self call
bob -> bib ++  #005500 : a
bob -> george ** : create
return done
return rc
bob -> george !! : delete
return success
@enduml
</puml>

seems alright now, the only issue is the delayed deletion of the stale diagrams. Could be improved in the future but ok for now.

For use of src in puml, via e.g. <puml src="diagrams/ArchitectureSequenceDiagram.puml" width="574" />, I am encountering some problems. It seems like we might either have some collision or accidentally delete the wrong files.
To reproduce:

image

(On initial markbind serve, the _site images are present)
image

(After double save, only the first diagram remains, hence resulting in error)
image

@SPWwj could you investigate this, please?

@SPWwj
Copy link
Contributor Author

SPWwj commented Aug 25, 2023

This is correct because the console log is placed at the beginning of the function. However, the following code is in charge of preventing the regeneration of the existing diagram:

I see, thank you for that!

I adjusted the comments slightly as well as code (made unlink sync).

For inline use of puml diagrams via e.g.

<puml width="300">
@startuml
B -> bob ++ : hellos
bob -> bob ++ : self call
bob -> bib ++  #005500 : a
bob -> george ** : create
return done
return rc
bob -> george !! : delete
return success
@enduml
</puml>

seems alright now, the only issue is the delayed deletion of the stale diagrams. Could be improved in the future but ok for now.

For use of src in puml, via e.g. <puml src="diagrams/ArchitectureSequenceDiagram.puml" width="574" />, I am encountering some problems. It seems like we might either have some collision or accidentally delete the wrong files. To reproduce:

image

(On initial markbind serve, the _site images are present) image

(After double save, only the first diagram remains, hence resulting in error) image

@SPWwj could you investigate this, please?

The hot reload scans only the PUML diagram on the modified page, causing diagrams on other non-modified pages to become stale. Currently, there's no good solution, as there's no way to identify inline PUML. Every change to inline PUML generates a new diagram without a common identifier to the stale diagram.

Since this is for production, one option is to clean up the PUML diagrams in the directory on init when no diagram is in the tracker list, to remove the redundant diagrams.

@tlylt
Copy link
Contributor

tlylt commented Aug 25, 2023

Does making the plantuml status tracking hashmap persist through site rebuild help to fix the issue? Either somehow make it global or write to file

@SPWwj
Copy link
Contributor Author

SPWwj commented Aug 25, 2023

Does making the plantuml status tracking hashmap persist through site rebuild help to fix the issue? Either somehow make it global or write to file

The issue won't be resolved, as we have no way to identify and remove the inline PUML diagram tracker from the tracking list. This is because there's no common identifier for the inline PUML diagram, unlike the .puml file where we can use the filename as an identifier.

One solution is to remove the diagram during the initial load with 'Markbind serve'. This ensures that our tracker list only captures the required diagrams.

@tlylt
Copy link
Contributor

tlylt commented Aug 26, 2023

Hey @SPWwj could you help me understand your replies further?

Every change to inline PUML generates a new diagram without a common identifier to the stale diagram.

Aren't we currently making the identifier depend on both the output file path and the puml content, meaning every change to any PUML diagram will generate a new diagram without a common identifier to the stale diagram? This includes both inline and file puml diagrams?

Since this is for production, one option is to clean up the PUML diagrams in the directory on init when no diagram is in the tracker list, to remove the redundant diagrams.

"what" is for production? What do you mean by "production" here?

One solution is to remove the diagram during the initial load with 'Markbind serve'. This ensures that our tracker list only captures the required diagrams.

Do you mean we now ONLY remove the diagram during the initial load with "markbind serve", and stop removing the diagrams afterwards? I think that is a viable solution.

@SPWwj
Copy link
Contributor Author

SPWwj commented Aug 26, 2023

Hey @SPWwj could you help me understand your replies further?

Every change to inline PUML generates a new diagram without a common identifier to the stale diagram.

Aren't we currently making the identifier depend on both the output file path and the puml content, meaning every change to any PUML diagram will generate a new diagram without a common identifier to the stale diagram? This includes both inline and file puml diagrams?

The goal is to remove unused, outdated diagrams.

Here's the relevant code snippet:
image

If node.attribs.name is present, a new diagram can overwrite the old one using the same imageOutputPath.
imageOutputPath also can be used to update the current tracking list.

    const imageOutputPath = path.resolve(config.outputPath, PUML_DIR, pathFromRootToImage);
    generateDiagram(imageOutputPath, pumlContent);

If node.attribs.name is missing, the old diagram can't be removed or replaced by a new one,
and tracking list will expand because we have no information on which diagram has been updated.

Since this is for production, one option is to clean up the PUML diagrams in the directory on init when no diagram is in the tracker list, to remove the redundant diagrams.

"what" is for production? What do you mean by "production" here?

When the site is ready to be deployed and go live.

One solution is to remove the diagram during the initial load with 'Markbind serve'. This ensures that our tracker list only captures the required diagrams.

Do you mean we now ONLY remove the diagram during the initial load with "markbind serve", and stop removing the diagrams afterwards? I think that is a viable solution.

Yes, because during the implementation of the website, I don't think there's a need to worry about deleting unused diagrams while editing PUML source code. This will also improve the undo operation during PUML code editing.

@tlylt
Copy link
Contributor

tlylt commented Aug 26, 2023

If node.attribs.name is missing, the old diagram can't be removed or replaced by a new one, and tracking list will expand because we have no information on which diagram has been updated.

Hmm, now it reminds me that we should ensure that our solution works for the case where the user edits the plantuml files without running markbind serve.

Yes, because during the implementation of the website, I don't think there's a need to worry about deleting unused diagrams while editing PUML source code. This will also improve the undo operation during PUML code editing.

Sure, sounds like a plan 👍.
To track, there are a few use cases to check in order to ensure the functionality works:

  • unused diagrams don't end up accumulating and occupying disk space (should at least be cleared after stopping and rerunning markbind serve
  • updates to inline puml should work
  • updates to puml with src to a file should work
  • updates to puml with the user-defined name should work
  • updates to puml done without running markbind serve should work

@SPWwj
Copy link
Contributor Author

SPWwj commented Aug 26, 2023

@tlylt Actually, I just realized that MarkBind already takes care of deleting redundant files during build time. Thus, there's no need to worry about accumulating and occupying disk space.

Copy link
Contributor

@tlylt tlylt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @SPWwj, LGTM 🚀

I updated the relevant docs to:

  • use the <puml src="...puml"> directly in our UG
  • remove the generated diagrams since they will be generated on the fly
  • update instructions in DG regarding puml files in UG
  • add a note on duplicated names may result in invalid output

And finally some minor refactoring/adjusting comments.

@tlylt tlylt added this to the v5.0.2 milestone Aug 27, 2023
@tlylt tlylt merged commit da9d3f7 into MarkBind:master Aug 27, 2023
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

PlantUML diagrams require a manual refresh when live-previewing in lazy loading mode
4 participants