Skip to content

Commit

Permalink
Merge branch 'sun4v-64bit-DMA'
Browse files Browse the repository at this point in the history
Tushar Dave says:

====================
sparc: Enable sun4v hypervisor PCI IOMMU v2 APIs and ATU

ATU (Address Translation Unit) is a new IOMMU in SPARC supported with
sun4v hypervisor PCI IOMMU v2 APIs.

Current SPARC IOMMU supports only 32bit address ranges and one TSB
per PCIe root complex that has a 2GB per root complex DVMA space
limit. The limit has become a scalability bottleneck nowadays that
a typical 10G/40G NIC can consume 500MB DVMA space per instance.
When DVMA resource is exhausted, devices will not be usable
since the driver can't allocate DVMA.

For example, we recently experienced legacy IOMMU limitation while
using i40e driver in system with large number of CPUs (e.g. 128).
Four ports of i40e, each request 128 QP (Queue Pairs). Each queue has
512 (default) descriptors. So considering only RX queues (because RX
premap DMA buffers), i40e takes 4*128*512 number of DMA entries in
IOMMU table. Legacy IOMMU can have at max (2G/8K)- 1 entries available
in table. So bringing up four instance of i40e alone saturate existing
IOMMU resource.

ATU removes bottleneck by allowing guest os to create IOTSB of size
32G (or more) with 64bit address ranges available in ATU HW. 32G is
more than enough DVMA space to be shared by all PCIe devices under
root complex contrast to 2G space provided by legacy IOMMU.

ATU allows PCIe devices to use 64bit DMA addressing. Devices
which choose to use 32bit DMA mask will continue to work with the
existing legacy IOMMU.

The patch set is tested on sun4v (T1000, T2000, T3, T4, T5, T7, S7)
and sun4u SPARC.

Thanks.
-Tushar

v2->v3:
- Patch #5 addresses comment by Joe Perches.
 -- use %s, __func__ instead of embedding the function name.

v1->v2:
- Patch #2 addresses comments by Dave M.
 -- use page allocator to allocate IOTSB.
 -- use true/false with boolean variables.
====================

Signed-off-by: David S. Miller <[email protected]>
  • Loading branch information
davem330 committed Nov 18, 2016
2 parents 87a349f + d30a6b8 commit 49cc0c4
Show file tree
Hide file tree
Showing 8 changed files with 849 additions and 60 deletions.
22 changes: 22 additions & 0 deletions arch/sparc/Kconfig
Original file line number Diff line number Diff line change
Expand Up @@ -89,6 +89,14 @@ config ARCH_DEFCONFIG
config ARCH_PROC_KCORE_TEXT
def_bool y

config ARCH_ATU
bool
default y if SPARC64

config ARCH_DMA_ADDR_T_64BIT
bool
default y if ARCH_ATU

config IOMMU_HELPER
bool
default y if SPARC64
Expand Down Expand Up @@ -304,6 +312,20 @@ config ARCH_SPARSEMEM_ENABLE
config ARCH_SPARSEMEM_DEFAULT
def_bool y if SPARC64

config FORCE_MAX_ZONEORDER
int "Maximum zone order"
default "13"
help
The kernel memory allocator divides physically contiguous memory
blocks into "zones", where each zone is a power of two number of
pages. This option selects the largest power of two that the kernel
keeps in the memory allocator. If you need to allocate very large
blocks of physically contiguous memory, then you may need to
increase this value.

This config option is actually maximum order plus one. For example,
a value of 13 means that the largest free memory block is 2^12 pages.

source "mm/Kconfig"

if SPARC64
Expand Down
343 changes: 343 additions & 0 deletions arch/sparc/include/asm/hypervisor.h
Original file line number Diff line number Diff line change
Expand Up @@ -2335,6 +2335,348 @@ unsigned long sun4v_vintr_set_target(unsigned long dev_handle,
*/
#define HV_FAST_PCI_MSG_SETVALID 0xd3

/* PCI IOMMU v2 definitions and services
*
* While the PCI IO definitions above is valid IOMMU v2 adds new PCI IO
* definitions and services.
*
* CTE Clump Table Entry. First level table entry in the ATU.
*
* pci_device_list
* A 32-bit aligned list of pci_devices.
*
* pci_device_listp
* real address of a pci_device_list. 32-bit aligned.
*
* iotte IOMMU translation table entry.
*
* iotte_attributes
* IO Attributes for IOMMU v2 mappings. In addition to
* read, write IOMMU v2 supports relax ordering
*
* io_page_list A 64-bit aligned list of real addresses. Each real
* address in an io_page_list must be properly aligned
* to the pagesize of the given IOTSB.
*
* io_page_list_p Real address of an io_page_list, 64-bit aligned.
*
* IOTSB IO Translation Storage Buffer. An aligned table of
* IOTTEs. Each IOTSB has a pagesize, table size, and
* virtual address associated with it that must match
* a pagesize and table size supported by the un-derlying
* hardware implementation. The alignment requirements
* for an IOTSB depend on the pagesize used for that IOTSB.
* Each IOTTE in an IOTSB maps one pagesize-sized page.
* The size of the IOTSB dictates how large of a virtual
* address space the IOTSB is capable of mapping.
*
* iotsb_handle An opaque identifier for an IOTSB. A devhandle plus
* iotsb_handle represents a binding of an IOTSB to a
* PCI root complex.
*
* iotsb_index Zero-based IOTTE number within an IOTSB.
*/

/* The index_count argument consists of two fields:
* bits 63:48 #iottes and bits 47:0 iotsb_index
*/
#define HV_PCI_IOTSB_INDEX_COUNT(__iottes, __iotsb_index) \
(((u64)(__iottes) << 48UL) | ((u64)(__iotsb_index)))

/* pci_iotsb_conf()
* TRAP: HV_FAST_TRAP
* FUNCTION: HV_FAST_PCI_IOTSB_CONF
* ARG0: devhandle
* ARG1: r_addr
* ARG2: size
* ARG3: pagesize
* ARG4: iova
* RET0: status
* RET1: iotsb_handle
* ERRORS: EINVAL Invalid devhandle, size, iova, or pagesize
* EBADALIGN r_addr is not properly aligned
* ENORADDR r_addr is not a valid real address
* ETOOMANY No further IOTSBs may be configured
* EBUSY Duplicate devhandle, raddir, iova combination
*
* Create an IOTSB suitable for the PCI root complex identified by devhandle,
* for the DMA virtual address defined by the argument iova.
*
* r_addr is the properly aligned base address of the IOTSB and size is the
* IOTSB (table) size in bytes.The IOTSB is required to be zeroed prior to
* being configured. If it contains any values other than zeros then the
* behavior is undefined.
*
* pagesize is the size of each page in the IOTSB. Note that the combination of
* size (table size) and pagesize must be valid.
*
* virt is the DMA virtual address this IOTSB will map.
*
* If successful, the opaque 64-bit handle iotsb_handle is returned in ret1.
* Once configured, privileged access to the IOTSB memory is prohibited and
* creates undefined behavior. The only permitted access is indirect via these
* services.
*/
#define HV_FAST_PCI_IOTSB_CONF 0x190

/* pci_iotsb_info()
* TRAP: HV_FAST_TRAP
* FUNCTION: HV_FAST_PCI_IOTSB_INFO
* ARG0: devhandle
* ARG1: iotsb_handle
* RET0: status
* RET1: r_addr
* RET2: size
* RET3: pagesize
* RET4: iova
* RET5: #bound
* ERRORS: EINVAL Invalid devhandle or iotsb_handle
*
* This service returns configuration information about an IOTSB previously
* created with pci_iotsb_conf.
*
* iotsb_handle value 0 may be used with this service to inquire about the
* legacy IOTSB that may or may not exist. If the service succeeds, the return
* values describe the legacy IOTSB and I/O virtual addresses mapped by that
* table. However, the table base address r_addr may contain the value -1 which
* indicates a memory range that cannot be accessed or be reclaimed.
*
* The return value #bound contains the number of PCI devices that iotsb_handle
* is currently bound to.
*/
#define HV_FAST_PCI_IOTSB_INFO 0x191

/* pci_iotsb_unconf()
* TRAP: HV_FAST_TRAP
* FUNCTION: HV_FAST_PCI_IOTSB_UNCONF
* ARG0: devhandle
* ARG1: iotsb_handle
* RET0: status
* ERRORS: EINVAL Invalid devhandle or iotsb_handle
* EBUSY The IOTSB is bound and may not be unconfigured
*
* This service unconfigures the IOTSB identified by the devhandle and
* iotsb_handle arguments, previously created with pci_iotsb_conf.
* The IOTSB must not be currently bound to any device or the service will fail
*
* If the call succeeds, iotsb_handle is no longer valid.
*/
#define HV_FAST_PCI_IOTSB_UNCONF 0x192

/* pci_iotsb_bind()
* TRAP: HV_FAST_TRAP
* FUNCTION: HV_FAST_PCI_IOTSB_BIND
* ARG0: devhandle
* ARG1: iotsb_handle
* ARG2: pci_device
* RET0: status
* ERRORS: EINVAL Invalid devhandle, iotsb_handle, or pci_device
* EBUSY A PCI function is already bound to an IOTSB at the same
* address range as specified by devhandle, iotsb_handle.
*
* This service binds the PCI function specified by the argument pci_device to
* the IOTSB specified by the arguments devhandle and iotsb_handle.
*
* The PCI device function is bound to the specified IOTSB with the IOVA range
* specified when the IOTSB was configured via pci_iotsb_conf. If the function
* is already bound then it is unbound first.
*/
#define HV_FAST_PCI_IOTSB_BIND 0x193

/* pci_iotsb_unbind()
* TRAP: HV_FAST_TRAP
* FUNCTION: HV_FAST_PCI_IOTSB_UNBIND
* ARG0: devhandle
* ARG1: iotsb_handle
* ARG2: pci_device
* RET0: status
* ERRORS: EINVAL Invalid devhandle, iotsb_handle, or pci_device
* ENOMAP The PCI function was not bound to the specified IOTSB
*
* This service unbinds the PCI device specified by the argument pci_device
* from the IOTSB identified * by the arguments devhandle and iotsb_handle.
*
* If the PCI device is not bound to the specified IOTSB then this service will
* fail with status ENOMAP
*/
#define HV_FAST_PCI_IOTSB_UNBIND 0x194

/* pci_iotsb_get_binding()
* TRAP: HV_FAST_TRAP
* FUNCTION: HV_FAST_PCI_IOTSB_GET_BINDING
* ARG0: devhandle
* ARG1: iotsb_handle
* ARG2: iova
* RET0: status
* RET1: iotsb_handle
* ERRORS: EINVAL Invalid devhandle, pci_device, or iova
* ENOMAP The PCI function is not bound to an IOTSB at iova
*
* This service returns the IOTSB binding, iotsb_handle, for a given pci_device
* and DMA virtual address, iova.
*
* iova must be the base address of a DMA virtual address range as defined by
* the iommu-address-ranges property in the root complex device node defined
* by the argument devhandle.
*/
#define HV_FAST_PCI_IOTSB_GET_BINDING 0x195

/* pci_iotsb_map()
* TRAP: HV_FAST_TRAP
* FUNCTION: HV_FAST_PCI_IOTSB_MAP
* ARG0: devhandle
* ARG1: iotsb_handle
* ARG2: index_count
* ARG3: iotte_attributes
* ARG4: io_page_list_p
* RET0: status
* RET1: #mapped
* ERRORS: EINVAL Invalid devhandle, iotsb_handle, #iottes,
* iotsb_index or iotte_attributes
* EBADALIGN Improperly aligned io_page_list_p or I/O page
* address in the I/O page list.
* ENORADDR Invalid io_page_list_p or I/O page address in
* the I/O page list.
*
* This service creates and flushes mappings in the IOTSB defined by the
* arguments devhandle, iotsb.
*
* The index_count argument consists of two fields. Bits 63:48 contain #iotte
* and bits 47:0 contain iotsb_index
*
* The first mapping is created in the IOTSB index specified by iotsb_index.
* Subsequent mappings are created at iotsb_index+1 and so on.
*
* The attributes of each mapping are defined by the argument iotte_attributes.
*
* The io_page_list_p specifies the real address of the 64-bit-aligned list of
* #iottes I/O page addresses. Each page address must be a properly aligned
* real address of a page to be mapped in the IOTSB. The first entry in the I/O
* page list contains the real address of the first page, the 2nd entry for the
* 2nd page, and so on.
*
* #iottes must be greater than zero.
*
* The return value #mapped is the actual number of mappings created, which may
* be less than or equal to the argument #iottes. If the function returns
* successfully with a #mapped value less than the requested #iottes then the
* caller should continue to invoke the service with updated iotsb_index,
* #iottes, and io_page_list_p arguments until all pages are mapped.
*
* This service must not be used to demap a mapping. In other words, all
* mappings must be valid and have one or both of the RW attribute bits set.
*
* Note:
* It is implementation-defined whether I/O page real address validity checking
* is done at time mappings are established or deferred until they are
* accessed.
*/
#define HV_FAST_PCI_IOTSB_MAP 0x196

/* pci_iotsb_map_one()
* TRAP: HV_FAST_TRAP
* FUNCTION: HV_FAST_PCI_IOTSB_MAP_ONE
* ARG0: devhandle
* ARG1: iotsb_handle
* ARG2: iotsb_index
* ARG3: iotte_attributes
* ARG4: r_addr
* RET0: status
* ERRORS: EINVAL Invalid devhandle,iotsb_handle, iotsb_index
* or iotte_attributes
* EBADALIGN Improperly aligned r_addr
* ENORADDR Invalid r_addr
*
* This service creates and flushes a single mapping in the IOTSB defined by the
* arguments devhandle, iotsb.
*
* The mapping for the page at r_addr is created at the IOTSB index specified by
* iotsb_index with the attributes iotte_attributes.
*
* This service must not be used to demap a mapping. In other words, the mapping
* must be valid and have one or both of the RW attribute bits set.
*
* Note:
* It is implementation-defined whether I/O page real address validity checking
* is done at time mappings are established or deferred until they are
* accessed.
*/
#define HV_FAST_PCI_IOTSB_MAP_ONE 0x197

/* pci_iotsb_demap()
* TRAP: HV_FAST_TRAP
* FUNCTION: HV_FAST_PCI_IOTSB_DEMAP
* ARG0: devhandle
* ARG1: iotsb_handle
* ARG2: iotsb_index
* ARG3: #iottes
* RET0: status
* RET1: #unmapped
* ERRORS: EINVAL Invalid devhandle, iotsb_handle, iotsb_index or #iottes
*
* This service unmaps and flushes up to #iottes mappings starting at index
* iotsb_index from the IOTSB defined by the arguments devhandle, iotsb.
*
* #iottes must be greater than zero.
*
* The actual number of IOTTEs unmapped is returned in #unmapped and may be less
* than or equal to the requested number of IOTTEs, #iottes.
*
* If #unmapped is less than #iottes, the caller should continue to invoke this
* service with updated iotsb_index and #iottes arguments until all pages are
* demapped.
*/
#define HV_FAST_PCI_IOTSB_DEMAP 0x198

/* pci_iotsb_getmap()
* TRAP: HV_FAST_TRAP
* FUNCTION: HV_FAST_PCI_IOTSB_GETMAP
* ARG0: devhandle
* ARG1: iotsb_handle
* ARG2: iotsb_index
* RET0: status
* RET1: r_addr
* RET2: iotte_attributes
* ERRORS: EINVAL Invalid devhandle, iotsb_handle, or iotsb_index
* ENOMAP No mapping was found
*
* This service returns the mapping specified by index iotsb_index from the
* IOTSB defined by the arguments devhandle, iotsb.
*
* Upon success, the real address of the mapping shall be returned in
* r_addr and thethe IOTTE mapping attributes shall be returned in
* iotte_attributes.
*
* The return value iotte_attributes may not include optional features used in
* the call to create the mapping.
*/
#define HV_FAST_PCI_IOTSB_GETMAP 0x199

/* pci_iotsb_sync_mappings()
* TRAP: HV_FAST_TRAP
* FUNCTION: HV_FAST_PCI_IOTSB_SYNC_MAPPINGS
* ARG0: devhandle
* ARG1: iotsb_handle
* ARG2: iotsb_index
* ARG3: #iottes
* RET0: status
* RET1: #synced
* ERROS: EINVAL Invalid devhandle, iotsb_handle, iotsb_index, or #iottes
*
* This service synchronizes #iottes mappings starting at index iotsb_index in
* the IOTSB defined by the arguments devhandle, iotsb.
*
* #iottes must be greater than zero.
*
* The actual number of IOTTEs synchronized is returned in #synced, which may
* be less than or equal to the requested number, #iottes.
*
* Upon a successful return, #synced is less than #iottes, the caller should
* continue to invoke this service with updated iotsb_index and #iottes
* arguments until all pages are synchronized.
*/
#define HV_FAST_PCI_IOTSB_SYNC_MAPPINGS 0x19a

/* Logical Domain Channel services. */

#define LDC_CHANNEL_DOWN 0
Expand Down Expand Up @@ -2993,6 +3335,7 @@ unsigned long sun4v_m7_set_perfreg(unsigned long reg_num,
#define HV_GRP_SDIO 0x0108
#define HV_GRP_SDIO_ERR 0x0109
#define HV_GRP_REBOOT_DATA 0x0110
#define HV_GRP_ATU 0x0111
#define HV_GRP_M7_PERF 0x0114
#define HV_GRP_NIAG_PERF 0x0200
#define HV_GRP_FIRE_PERF 0x0201
Expand Down
Loading

0 comments on commit 49cc0c4

Please sign in to comment.