Add VPM size to detectPlatform output #57

wimrijnders · 2018-07-16T10:40:47Z

Now that DMA is enabled, it's interesting to know what the actual size is of the VPM.
This adds it to the output of detectPlatform.

Example output:

> sudo obj-qpu/bin/detectPlatform
Detected platform: Raspberry Pi 2 Model B Rev 1.1
Hardware revision: a01041
Number of slices: 3
Number of QPU's per slice: 4
Size of VPM: 12KB                    # <-- This is new

NOTE: This will probably not work on your machine until #52 and #53 have been merged.

Now that DMA is enabled, it's interesting to know what the actual size is of the VPM. This adds it to the output of `detectPlatform`. Example output: ``` > sudo obj-qpu/bin/detectPlatform Detected platform: Raspberry Pi 2 Model B Rev 1.1 Hardware revision: a01041 Number of slices: 3 Number of QPU's per slice: 4 Size of VPM: 12KB # <-- This is new ``` **NOTE:* This will probably not work on your machine until mn416#52 and mn416#53 have been merged.

wimrijnders · 2018-07-16T10:46:57Z

So, the VPM can hold 192 blocks of 16-floats at any time.

If I read the reference doc properly, the VPM is also used for outgoing data, so assume 96 blocks for incoming data.

This means that when utilizing all 12 QPU's, you can prefetch 8 full groups of data with DMA. There is definitely room for keeping the QPU's busy here. Looking forward to maximal utilization of this.

EDIT: If you can reuse the blocks for incoming data for writing back the results, it will be double that! Very exciting this. I'm hoping that I'm smart enough to figure how to make this work, otherwise I'm really hoping that you implement it.

wimrijnders · 2018-07-16T11:24:48Z

Also, I think it is now possible to use the VPM for local storage of data, correct? This opens possibilities. Or would you regard it as an abuse of the VPM?

wimrijnders · 2018-07-16T11:44:47Z

Examined the VPM register definitions in the reference docs, p56 onwards. It's appears that I was overly optimistic in my utterances. Please confirm if the following is correct, if you can:

A given QPU can initiate at most one DMA read and one DMA write at any given time. However, the read and the write can overlap.

In addition, I encountered something about limits of VPM usage with respect to given shader types. So possibly not all of the VPM is available at any given time. However, OTOH, it's possible to configure what shaders are allowed to use the VPM.

So my grand scheme of maximal prefetching is not possible. Time to start thinking about something else right now.

wimrijnders · 2018-07-20T05:20:46Z

Last commit: Added two more methods to RegisterMap, which I'm interested in:

Number of TMU's per slice
Check if L2 cache is enabled.

wimrijnders added 2 commits July 16, 2018 23:05

Merge branch 'development' into add-vpm-size

96edab8

Add num TMU and L2 cache enabled to detectPlatform output

52d837e

wimrijnders added 3 commits July 20, 2018 07:22

Text edit

9ee5473

Merge branch 'development' into add-vpm-size

fbfe79f

Merge branch 'development' into add-vpm-size

3bbc1a9

wimrijnders closed this Jul 12, 2020

wimrijnders deleted the add-vpm-size branch July 12, 2020 06:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add VPM size to detectPlatform output #57

Add VPM size to detectPlatform output #57

wimrijnders commented Jul 16, 2018

wimrijnders commented Jul 16, 2018 •

edited

Loading

wimrijnders commented Jul 16, 2018

wimrijnders commented Jul 16, 2018 •

edited

Loading

wimrijnders commented Jul 20, 2018

Add VPM size to detectPlatform output #57

Add VPM size to detectPlatform output #57

Conversation

wimrijnders commented Jul 16, 2018

wimrijnders commented Jul 16, 2018 • edited Loading

wimrijnders commented Jul 16, 2018

wimrijnders commented Jul 16, 2018 • edited Loading

wimrijnders commented Jul 20, 2018

wimrijnders commented Jul 16, 2018 •

edited

Loading

wimrijnders commented Jul 16, 2018 •

edited

Loading