docs.sun.com · Writing Device Drivers

DMA

7

Many devices can temporarily take control of the bus and perform data transfers to (and from) main memory or other devices. Since the device is doing the work without the help of the CPU, this type of data transfer is known as a direct memory access (DMA). DMA transfers can be performed between two devices, between a device and memory, or between memory and memory. This chapter covers transfers between a device and memory only.

DMA Model

The Solaris 2.x device driver interface/driver-kernel interface provides a high-level, architecture-independent model for DMA. This allows the framework (the DMA routines) to hide such architecture-specific details as:

Setting up DMA mappings
Building scatter-gather lists
Ensuring I/O and CPU caches are consistent

There are several abstractions that are used in the DDI/DKI to describe aspects of a DMA transaction. These include:

DMA object - Memory that is the source or destination of a DMA transfer.
DMA handle - An opaque object returned from a successful ddi_dma_alloc_handle(9F)call. The DMA handle is used in successive DMA subroutine calls to refer to the DMA object.

DMA cookie - A ddi_dma_cookie(9S) structure (ddi_dma_cookie_t) describes a contiguous portion of a DMA object that is entirely addressable by the device. It contains DMA addressing information required to program the DMA engine.

Rather than knowing that a platform needs to map an object (typically a memory buffer) into a special DMA area of the kernel address space, device drivers instead allocate DMA resources for the object. The DMA routines then perform any platform-specific operations needed to set the object up for DMA access. The driver receives a DMA handle to identify the DMA resources allocated for the object. This handle is opaque to the device driver; the driver must save the handle and pass it in subsequent calls to DMA routines, but should not interpret it in any way.

Operations are defined on a DMA handle that provide the following services:

Manipulating DMA resources
Synchronizing DMA objects
Retrieving attributes of the allocated resources

Types of Device DMA

Devices may perform one of the following three types of DMA.

Bus-Master DMA

If the device is capable of acting as a true bus master (where the DMA engine resides on the device board), the driver should program the device's DMA registers directly. The transfer address and count are obtained from the DMA cookie and given to the device.

Devices on current SPARC platforms use this form of DMA exclusively.

Third-party DMA

Third-party DMA utilizes a system DMA engine resident on the main system board, which has several DMA channels available for use by devices. The device relies on the system's DMA engine to perform the data transfers between the device and memory. The driver uses DMA engine routines (see

ddi_dmae(9F)) to initialize and program the DMA engine. For each DMA data transfer, the driver programs the DMA engine and then gives the device a command to initiate the transfer in cooperation with that engine.

First-party DMA

Under first-party DMA, the device drives its own DMA bus cycles using a channel from the system's DMA engine. The ddi_dmae_1stparty(9F) function is used to configure this channel in a cascade mode so that the DMA engine will not interfere with the transfer.

DMA and DVMA

The platform that the device operates on may provide one of two types of memory access: direct memory access (DMA) or direct virtual memory access (DVMA).

On platforms that support DMA, the system provides the device with a physical address in order to perform transfers. In this case, one logical transfer may actually consist of a number of physically discontiguous transfers. An example of this occurs when an application transfers a buffer that spans several contiguous virtual pages that map to physically discontiguous pages. To deal with the discontiguous memory, devices for these platforms usually have some kind of scatter-gather DMA capability. Typically, x86 systems provide physical addresses for direct memory transfers.

On platforms that support DVMA, the system provides the device with a virtual address to perform transfers. In this case, the underlying platform provides some form of memory management unit (MMU) that translates device accesses to these virtual addresses into the proper physical addresses. The device transfers to and from a contiguous virtual image that may be mapped to discontiguous physical pages. Devices that operate in these platforms don't need scatter-gather DMA capability. Typically, the system that supports SPARC platforms provides virtual addresses for direct memory transfers.

Handles, Windows, and Cookies

A DMA handle is an opaque pointer representing an object (usually a memory buffer or address) where a device can perform DMA transfers. Several different calls to DMA routines use the handle to identify the DMA resources allocated for the object.

An object represented by a DMA handle is completely covered by one or more DMA cookies. A DMA cookie represents a contiguous piece of memory to or from which the DMA engine can transfer data.The system uses the information in the DMA attribute structure, and the memory location and alignment of the target object, to decide how to divide an object into multiple cookies.

If the object is too big to fit the request within system resource limitations, it has to be broken up into multiple DMA windows. Only one window is activated at one time and has resources allocated. The ddi_dma_getwin(9F) function is used to position between windows within an object. Each DMA window consists of one or more DMA cookies.

Scatter-Gather

Some DMA engines may be able to accept more than one cookie. Such engines can perform scatter-gather I/O without the help of the system. In this case, it is most efficient if the driver uses ddi_dma_nextcookie(9F) to get as many cookies as the DMA engine can handle and program them all into the engine. The device can then be programmed to transfer the total number of bytes covered by all these DMA cookies combined.

DMA Operations

The steps involved in a DMA transfer are similar among the types of DMA.

Bus-Master DMA

In general, the following steps must be performed for bus-master DMA.

Describe the DMA attributes. This allows the routines to ensure that the device will be able to access the buffer.
Allocate a DMA handle.

Lock the DMA object in memory (see physio(9F)).

Note - This step is not necessary in block drivers for buffers coming from the file system, as the file system has already locked the data in memory.

Allocate DMA resources for the object.
Program the DMA engine on the device and start it (this is device specific).
When the transfer is complete, continue the bus master operation.
Perform any required object synchronizations.
Release the DMA resources.
Free the DMA handle.

First-Party DMA

In general, the following steps must be performed for first-party DMA.

Allocate a DMA channel.
Configure the channel with ddi_dmae_1stparty(9F).
Lock the DMA object in memory (see physio(9F)).

Note - This step is not necessary in block drivers for buffers coming from the file system, as the file system has already locked the data in memory.

Allocate DMA resources for the object.
Program the DMA engine on the device and start it (this is device specific).
When the transfer is complete, continue the bus-master operation.
Perform any required object synchronizations.
Release the DMA resources.
Deallocate the DMA channel.
Free the DMA handle.

Third-Party DMA

In general, the following steps must be performed for third-party DMA.

Allocate a DMA channel.
Retrieve the system's DMA engine attributes with ddi_dmae_getattr(9F).
Lock the DMA object in memory (see physio(9F)).

Note - This step is not necessary in block drivers for buffers coming from the file system, as the file system has already locked the data in memory.

Allocate DMA resources for the object.
Program the system DMA engine to perform the transfer with ddi_dmae_prog(9F).
Perform any required object synchronizations.
Stop the DMA engine with ddi_dmae_stop(9F).
Release the DMA resources.
Deallocate the DMA channel.

Free the DMA handle.
Certain hardware platforms may restrict DMA capabilities in a bus-specific way. Drivers should use ddi_slaveonly(9F) to determine if the device is in a slot in which DMA is possible. For an example, see "attach( )" on page 101.

DMA Attributes

DMA attributes describe the built-in attributes and limits of a DMA engine, including:

Limits on addresses the device can access
Maximum transfer count
Address alignment restrictions

To ensure that DMA resources allocated by the system can be accessed by the device's DMA engine, device drivers must inform the system of their DMA engine limitations using a ddi_dma_attr(9S) structure. The system may impose additional restrictions on the device attributes, but it never removes any of the driver-supplied restrictions.

`ddi_dma_attr(9S)`

The DMA attribute structure has the following members:

uint_t           dma_attr_version;/* version number of this structure */
uint64_t         dma_attr_addr_lo;/* lower bound of bus address range */
uint64_t         dma_attr_addr_hi;/* inclusive upper bound of range */
uint64_t         dma_attr_count_max;/* max DMA transfer count - 1 */
uint64_t         dma_attr_align;/* DMA address aligment */
uint_t           dma_attr_burstsizes;/* DMA burstsize */
uint32_t         dma_attr_minxfer;/* minimum DMA transfer size */
uint64_t         dma_attr_maxxfer;/* max transfer sizeof a single I/O */
uint64_t         dma_attr_seg;/* segment boundary restriction */
int              dma_attr_sgllen;/* length of DMA scatter-gather list *
uint32_t         dma_attr_granular;/* granularity of transfer count */
uint_t           dma_attr_flags;/* additional DMA flags */

dma_attr_addr_lo is the lowest bus address that the DMA engine can access.

dma_attr_addr_hi is the highest bus address that the DMA engine can access.

dma_attr_count_max specifies the maximum transfer count that the DMA engine can handle in one cookie. The limit is expressed as the maximum count minus one. It is used as a bit mask, so it must also be one less than a power of two.

dma_attr_align specifies additional alignment requirements for any allocated DMA resources. This field can be used to force more restrictive alignment than implicitly specified by other DMA attributes such as alignment on a page boundary.

dam_attr_burstsizes specifies the burst sizes that the device supports. A burst size is the amount of data the device can transfer before relinquishing the bus. This member is a binary encoding of burst sizes, assumed to be powers of

two. For example, if the device is capable of doing 1-, 2-, 4-, and 16-byte bursts, this field should be set to 0x17. The system also uses this field to determine alignment restrictions.

dma_attr_minxfer is the minimum effective transfer size the device can perform. It also influences alignment and padding restrictions.

dma_attr_maxxfer describes the maximum number of bytes that the DMA engine can transmit or receive in one I/O command. This limitation is only significant if it is less than (dma_attr_count_max + 1) * dma_attr_seg. If the DMA engine has no particular limitation, this field should be set to 0xFFFFFFFF.

dma_attr_seg is the upper bound of the DMA engine's address register. This is often used where the upper 8 bits of an address register are a latch containing a segment number, and the lower 24 bits are used to address a segment. In this case, dma_attr_seg would be set to 0xFFFFFF, and prevents the system from crossing a 24-bit segment boundary when allocating resources for the object.

dma_attr_sgllen specifies the maximum number of entries in the scatter-gather list. It is the number of segments or cookies that the DMA engine can consume in one I/O request to the device. If the DMA engine has no scatter-gather list, this field should be set to one.

dma_attr_granular field describes the granularity of the device's DMA transfer ability, in units of bytes. This value is used to specify, for example, the sector size of a mass storage device. DMA requests will be broken into multiples of this value. If there is no scatter-gather capability, then the size of each DMA transfer will be a multiple of this value. If there is scatter-gather capability, then a single segment will not be smaller than the minimum transfer value, but may be less than the granularity; however the total transfer length of the scatter-gather list will be a multiple of the granularity value.

dma_attr_flags is reserved for future use. It must be set to 0.

SBus--Example One

A DMA engine on an SBus in a SPARC machine has the following attributes:

It can only access addresses ranging from 0xFF000000 to 0xFFFFFFFF.
It has a 32-bit DMA counter register.
It can handle byte-aligned transfers.

It supports 1-, 2- and 4-byte burst sizes.
It has a minimum effective transfer size of 1 byte.
It has a 32-bit address register.
It doesn't have a scatter-gather list.
The device operates on sectors only (for example a disk).

The resulting attribute structure is:

static ddi_dma_attr_t attributes = {
    DMA_ATTR_V0,     /* Version number */
    0xFF000000,      /* low address */
    0xFFFFFFFF,      /* high address */
    0xFFFFFFFF,      /* counter register max */
    1,               /* byte alignment */
    0x7,             /* burst sizes: 0x1 | 0x2 | 0x4 */
    0x1,             /* minimum transfer size */
    0xFFFFFFFF,      /* max xfer size */
    0xFFFFFFFF,      /* address register max */
    1,               /* no scatter-gather */
    512,             /* device operates on sectors */
    0,               /* attr flag: set to 0 */
};

VMEbus--Example Two

A DMA engine on a VMEbus in a SPARC machine has the following attributes:

It can address the full 32-bit range.
It has a 32-bit DMA counter register.
It can handle byte-aligned transfers.
It supports 2- to 256-byte burst sizes, and all powers of 2 in between.
It has a minimum effective transfer size of 2 bytes.
It has a 24-bit address register.
It has a 17-element scatter-gather list.
The device operates on sectors only.

The resulting attribute structure is:

static ddi_dma_attr_t attributes = {
    DMA_ATTR_V0,     /* Version number */
    0x00000000,      /* low address */
    0xFFFFFFFF,      /* high address */
    0xFFFFFFFF,      /* counter register max */
    1,               /* byte alignment */
    0x1FE,           /* burst sizes */

    0x2,             /* minimum transfer size */
    0xFFFFFFFF,      /* max xfer size */
    0xFFFFFF,        /* address register max */
    17,              /* no scatter-gather */
    512,             /* device operates on sectors */
    0,               /* attr flag: set to 0 */
};

ISAbus--Example Three

A DMA engine on an ISA bus in an x86 machine has the following attributes:

It accesses only the first 16 megabytes of memory.
It performs transfers to segments up to 32 Kbytes in size.
It has a 16-bit counter register.
It can handle byte-aligned transfers.
It supports 1-, 2- and 4-byte burst sizes.
It has a minimum effective transfer size of 1 byte.
It can hold up to 17 scatter-gather transfers.

The resulting attribute structure is:

static ddi_dma_attr_t attributes = {
    DMA_ATTR_V0,     /* Version number */
    0x00000000,      /* low address */
    0x00FFFFFF,      /* high address */
    0xFFFF,          /* counter register max */
    1,               /* byte alignment */
    0x7,             /* burst sizes */
    0x1,             /* minimum transfer size */
    0xFFFFFFFF,      /* max xfer size */
    0x00007FFF,      /* address register max */
    17,              /* no scatter-gather */
    512,             /* device operates on sectors */
    0,               /* attr flag: set to 0 */
};

Object Locking

Before allocating the DMA resources for a memory object, the object must be prevented from moving. If it is not, the system may remove the object from memory while the device is writing to it, causing the data transfer to fail and possibly corrupting the system. The process of preventing memory objects from moving during a DMA transfer is known as locking down the object.

Note - Locking objects in memory is not related to the type of locking used to protect data.

The following object types do not require explicit locking:

Buffers coming from the file system through strategy(9E). These buffers are already locked by the file system.
Kernel memory allocated within the device driver, such as that allocated by ddi_dma_mem_alloc(9F).

For other objects (such as buffers from user space), physio(9F) must be used to lock down the objects. This is usually performed in the read(9E) or write(9E) routines of a character device driver. See "Data Transfer Methods" on page 187 for an example.

Allocating a DMA Handle

A DMA handle is an opaque object that is used as a reference to subsequently allocated DMA resources. It is usually allocated in the driver's attach entry point using ddi_dma_alloc_handle(9F). ddi_dma_alloc_handle(9F) takes the device information referred to by dip and the device's DMA attributes described by a ddi_dma_attr(9S) structure as parameters.

int ddi_dma_alloc_handle(dev_info_t *dip, ddi_dma_attr_t *attr,
    int (*callback)(void *), void *arg,
    ddi_dma_handle_t *handlep);

dip is a pointer to the device's dev_info structure.

attr is a pointer to a ddi_dma_attr(9S) structure as described in "DMA Attributes" on page 132.

waitfp is the address of callback function for handling resource allocation failures.

arg is the argument to pass to the callback function.

handlep is a pointer to DMA handle (to store the returned handle).

Handling Resource Allocation Failures

The resource-allocation routines provide the driver several options when handling allocation failures. The waitfp argument indicates whether the allocation routines will block, return immediately, or schedule a callback.

waitfp	Indicated Action
`DDI_DMA_DONTWAIT`	Driver does not need to wait for resources to become available.
`DDI_DMA_SLEEP`	Driver is willing to wait indefinitely for resources to become available.
Other values	The address of a function to be called when resources are likely to be available.

Allocating DMA Resources

Two interfaces allocate DMA resources:

ddi_dma_buf_bind_handle(9F) - Used with buffer structures.
ddi_dma_addr_bind_handle(9F) - Used with virtual addresses.

Table 7-1 lists the appropriate DMA resource allocation interfaces for different classes of DMA objects.

***Table 7-1*** DMA Resource Allocation Interfaces
Type of Object	Resource Allocation Interface
Memory allocated within the driver using `ddi_dma_mem_alloc`(9F)	`ddi_dma_addr_bind_handle(9F)`
Requests from the file system through `strategy`(9E)	`ddi_dma_buf_bind_handle(9F)`
Memory in user space that has been locked down using `physio`(9F)	`ddi_dma_buf_bind_handle(9F)`

DMA resources are usually allocated in the driver's xxstart( ) routine, if one exists. See "Asynchronous Data Transfers" on page 220 for discussion of xxstart( ).

    int ddi_dma_addr_bind_handle(ddi_dma_handle_t handle,
        struct as *as, caddr_t addr,
        size_t len, uint_t flags, int (*waitfp)(caddr_t),
        caddr_t arg, ddi_dma_cookie_t *cookiep, uint_t *ccountp);
    int ddi_dma_buf_bind_handle(ddi_dma_handle_t handle,
        struct buf *bp, uint_t flags,
        int (*waitfp)(caddr_t), caddr_t arg,
        ddi_dma_cookie_t *cookiep, uint_t *ccountp);

ddi_dma_addr_bind_handle(9F) and ddi_dma_buf_bind_handle(9F) take the following arguments:

handle is a DMA handle.

The object to allocate resources for.

For ddi_dma_addr_bind_handle(9F), the object is described by an address range, where as is a pointer to an address space structure (this must be NULL), addr is the base kernel address of the object, and len is the length of the object in bytes.
For ddi_dma_buf_bind_handle(9F), the object is described by a buf(9S) structure pointer to by bp.

flags is a set of flags indicating the transfer direction and other attributes. DDI_DMA_READ indicates a data transfer from device to memory. DDI_DMA_WRITE indicates a data transfer from memory to device. See ddi_dma_addr_bind_handle(9F) or ddi_dma_buf_bind_handle(9F) for a complete discussion of the allowed flags.

waitfp is the address of callback function for handling resource allocation failures. See ddi_dma_alloc_handle(9F).

arg is the argument to pass to the callback function.

cookiep is a pointer to the first DMA cookie for this object.

ccountp is a pointer to the number of DMA cookies for this object.

State Structure

This section adds the following fields to the state structure. See "Software State Structure" on page 63 for more information.

struct buf            *bp;         /* current transfer */
ddi_dma_handle_t      handle;
struct xxiopb         *iopb_array;/* for I/O Parameter Blocks */
ddi_dma_handle_t      iopb_handle;

Device Register Structure

Devices that do DMA have more registers than have been used in previous examples. This section adds the following fields to the device register structure to support DMA-capable device examples.

For DMA engines without scatter-gather support:

    uint32_t         dma_addr;     /* starting address for DMA */
    uint32_t         dma_size;     /* amount of data to transfer */

For DMA engines with scatter-gather support:

struct sglentry {
    uint32_t         dma_addr;
    uint32_t         dma_size;
} sglist[SGLLEN];
caddr_t  iopb_addr;/* When written informs device of the next */
                      /* command's parameter block address. */
                      /* When read after an interrupt,contains */
                      /* the address of the completed command. */

DMA Callback Example

In Code Example 7-1 on page 141, xxstart( ) is used as the callback function and the per-device state structure is given as its argument. xxstart( ) attempts to start the command. If the command cannot be started because resources are not available, xxstart( ) is scheduled to be called sometime later, when resources might be available.

Because xxstart( ) is used as a DMA callback, it must follow these rules imposed on DMA callbacks:

It must not assume that resources are available (it must try to allocate them again).

It must indicate to the system whether allocation succeed by returning DDI_DMA_CALLBACK_RUNOUT if it fails to allocate resources (and needs to be called again later) or DDI_DMA_CALLBACK_DONE indicating success (so no further callback is necessary).

Code Example 7-1 Allocating DMA Resources

static int
xxstart(caddr_t arg)
{
    struct xxstate *xsp = (struct xxstate *)arg;
    struct device_reg *regp;
    int flags;
    mutex_enter(&xsp->mu);
    if (xsp->busy) {
        /* transfer in progress */
        mutex_exit(&xsp->mu);
        return (0);
    }
    xsp->busy = 1;
    mutex_exit(&xsp->mu);
    regp = xsp->regp;
    if (transfer is a read) {
        flags = DDI_DMA_READ;
    } else {
        flags = DDI_DMA_WRITE;
    }
    if (ddi_dma_buf_bind_handle(xsp->handle,xsp->bp,flags, xxstart,
        (caddr_t)xsp, &cookie, &ccount) != DDI_DMA_MAPPED) {
        /* really should check all return values in a switch */
        return (DDI_DMA_CALLBACK_RUNOUT);
    }
    ...
    program the DMA engine
    ...
    return (DDI_DMA_CALLBACK_DONE);
}

Burst Sizes

Drivers specify the DMA burst sizes their device supports in the dma_attr_burstsizes field of the ddi_dma_attr(9S) structure. This is a bitmap of the supported burst sizes. However, when DMA resources are

allocated, the system might impose further restrictions on the burst sizes that may actually be used by the device. The ddi_dma_burstsizes(9F) routine can be used to obtain the allowed burst sizes. It returns the appropriate burst size bitmap for the device. When DMA resources are allocated, a driver can ask the system for appropriate burst sizes to use for its DMA engine.

#define BEST_BURST_SIZE 0x20 /* 32 bytes */

if (ddi_dma_buf_bind_handle(xsp->handle,xsp->bp,flags, xxstart,
    (caddr_t)xsp, &cookie, &ccount) != DDI_DMA_MAPPED) {
        /* error handling */
        return (0);
}
burst = ddi_dma_burstsizes(xsp->handle);
/* check which bit is set and choose one burstsize to */
/* program the DMA engine */
if (burst & BEST_BURST_SIZE) {
    program DMA engine to use this burst size
} else {
    other cases
}

Programming the DMA Engine

When the resources have been successfully allocated, the device must be programmed. Although programming a DMA engine is device specific, all DMA engines require a starting address and a transfer count. Device drivers retrieve these two values from the DMA cookie returned by a successful call from ddi_dma_addr_bind_handle(9F), ddi_dma_buf_bind_handle(9F), or ddi_dma_getwin(9F). The latter functions all return the first DMA cookie and a cookie count indicating whether the DMA object consists of more than one cookie. If the cookie count N is greater than 1, ddi_dma_nextcookie(9F) has to be called N-1 times to retrieve all the remaining cookies.

A cookie is of type ddi_dma_cookie(9S) and has the following fields:

uint64_t         dmac_laddress; /* unsigned 64-bit address */
uint32_t         dmac_address;     /* unsigned 32-bit address */
size_t           dmac_size;        /* transfer size */
u_int            dmac_type;        /* bus-specific type bits */

The dmac_laddress specifies a 64-bit I/O address appropriate for programming the device's DMA engine. If a device has a 64- bit DMA address register a driver should use this field to program the DMA engine. The

dmac_laddress field specifies a 32-bit I/O address. It should be used for devices which have a 32-bit DMA address register. dmac_size contains the transfer count. Depending on the bus architecture, the third field in the cookie may be required by the driver. The driver should not perform any manipulations, such as logical or arithmetic, on the cookie.

For example:

ddi_dma_cookie_t      cookie;

if (ddi_dma_buf_bind_handle(xsp->handle,xsp->bp,flags, xxstart,
    (caddr_t)xsp, &cookie, &xsp->ccount) != DDI_DMA_MAPPED) {
        /* error handling */
        return (0);
}
sglp = regp->sglist;
for (cnt = 1; cnt <= SGLLEN; cnt++, sglp++) {
    /* store the cookie parms into the S/G list */
    ddi_put32(xsp->access_hdl, &sglp->dma_size,
        (uint32_t)cookie.dmac_size);
    ddi_put32(xsp->access_hdl, &sglp->dma_addr,
        cookie.dmac_address);
    /* Check for end of cookie list */
    if (cnt == xsp->ccount)
        break;
    /* Get next DMA cookie */
    (void) ddi_dma_nextcookie(xsp->handle, &cookie);
}
    /* start DMA transfer */
ddi_put8(xsp->access_hdl, &regp->csr,
    ENABLE_INTERRUPTS | START_TRANSFER);

Note - ddi_dma_buf_bind_handle(9F) may return more DMA cookies than fit into the scatter-gather list. In this case, the driver has to continue the transfer in the interrupt routine and reprogram the scatter-gather list with the remaining DMA cookies.

Freeing the DMA Resources

After a DMA transfer is completed (usually in the interrupt routine), the DMA resources may be released by calling ddi_dma_unbind_handle(9F).

As described in "Synchronizing Memory Objects" on page 147, ddi_dma_unbind_handle(9F) calls ddi_dma_sync(9F), eliminating the need for any explicit synchronization. After calling ddi_dma_unbind_handle(9F), the DMA resources become invalid, and further references to them have undefined results. Code Example 7-2 on page 144 shows how to use ddi_dma_unbind_handle(9F).

Code Example 7-2 Freeing DMA Resources

static u_int
xxintr(caddr_t arg)
{
    struct xxstate *xsp = (struct xxstate *)arg;
    uint8_t status, temp;
    mutex_enter(&xsp->mu);
    /* read status */
    status = ddi_get8(xsp->access_hdl, &xsp->regp->csr);
    if (!(status & INTERRUPTING)) {
        mutex_exit(&xsp->mu);
        return (DDI_INTR_UNCLAIMED);
    }
    ddi_put8(xsp->access_hdl, &xsp->regp->csr, CLEAR_INTERRUPT);
    /* for store buffers */
    temp = ddi_get8(xsp->access_hdl, &xsp->regp->csr);
    ddi_dma_unbind_handle(xsp->handle);
    ...
    check for errors
    ...
    xsp->busy = 0;
    mutex_exit(&xsp->mu);

if (pending transfers) {

        (void) xxstart((caddr_t)xsp);
    }
    return (DDI_INTR_CLAIMED);
}

The DMA resources should be released and reallocated if a different object will be used in the next transfer. However, if the same object is always used, the resources may be allocated once and continually reused as long as there are intervening calls to ddi_dma_sync(9F).

Freeing the DMA Handle

When the driver is unloaded, the DMA handle must be freed. ddi_dma_free_handle(9F) destroys the DMA handle and any residual resources the system may be caching on the handle. Any further references of the DMA handle will have undefined results. In ddi_dma_free_handle(9F), handlep is a pointer to the DMA handle.

void ddi_dma_free_handle(ddi_dma_handle_t *handlep);

Canceling DMA Callbacks

DMA callbacks cannot be cancelled. This requires some additional code in the drivers detach(9E) routine, as it must not return DDI_SUCCESS if there are any outstanding callbacks. (See Code Example 7-3.) When DMA callbacks occur, the detach(9E) routine must wait for the callback to run and must prevent it from rescheduling itself. This can be done using additional fields in the state structure:

    int          cancel_callbacks;     /* detach(9E) sets this to */
                                        /* prevent callbacks from */
                                        /* rescheduling themselves */
    int          callback_count;       /* number of outstanding
                                        /* callbacks */
    kmutex_t     callback_mutex;       /* protects callback_count and */
                                        /* cancel_callbacks. */
    kcondvar_t callback_cv;            /* condition is that
                                        /* callback_count is zero*/
                                        /* detach(9E) waits on it */

Code Example 7-3 Canceling DMA Callbacks

static int
xxdetach(dev_info_t *dip, ddi_detach_cmd_t cmd)
{
    ...
    mutex_enter(&xsp->callback_mutex);
    xsp->cancel_callbacks = 1;
    while (xsp->callback_count > 0) {
        cv_wait(&xsp->callback_cv, &xsp->callback_mutex);
    }
    mutex_exit(&xsp->callback_mutex);
    ...
}

static int
xxstrategy(struct buf *bp)
{
    ...
    mutex_enter(&xsp->callback_mutex);
    xsp->bp = bp;
    error = ddi_dma_buf_bind_handle(xsp->handle, xsp->bp, flags,
                 xxdmacallback, (caddr_t)xsp, &cookie, &ccount);
    if (error == DDI_DMA_NORESOURCES)
        xsp->callback_count++;
    mutex_exit(&xsp->callback_mutex);
    ...
}
static int
xxdmacallback(caddr_t callbackarg)
{
    struct xxstate *xsp = (struct xxstate *)callbackarg;
    ...
    mutex_enter(&xsp->callback_mutex);
    if (xsp->cancel_callbacks) {
        /* do not reschedule, in process of detaching */
        xsp->callback_count--;
        if (xsp->callback_count == 0)
             cv_signal(&xsp->callback_cv);
        mutex_exit(&xsp->callback_mutex);
        return (DDI_DMA_CALLBACK_DONE);/* don't reschedule it */
    }
    /*
     * Presumably at this point the device is still active
     * and will not be detached until the DMA has completed.
     * A return of 0 means try again later
     */
    error = ddi_dma_buf_bind_handle(xsp->handle, xsp->bp, flags,
                 DDI_DMA_DONTWAIT, NULL, &cookie, &ccount);
    if (error == DDI_DMA_MAPPED) {
        ...
        program the DMA engine
        ...
        xsp->callback_count--;
        mutex_exit(&xsp->callback_mutex);
        return (DDI_DMA_CALLBACK_DONE);
    }
    if (error != DDI_DMA_NORESOURCES) {
        xsp->callback_count--;

        mutex_exit(&xsp->callback_mutex);
        return (DDI_DMA_CALLBACK_DONE);
    }
    mutex_exit(&xsp->callback_mutex);
    return (DDI_DMA_CALLBACK_RUNOUT);
}

Synchronizing Memory Objects

At various points when the memory object is accessed (including the time of removal of the DMA resources), the driver may need to synchronize the memory object with respect to various caches. This section gives guidelines on when and how to synchronize memory objects.

Cache

Cache is a very high-speed memory that sits between the CPU and the system's main memory (CPU cache), or between a device and the system's main memory (I/O cache), as shown in Figure 7-1.

Figure 7-1 CPU and System I/O Caches

When an attempt is made to read data from main memory, the associated cache first determines whether it contains the requested data. If so, it quickly satisfies the request. If the cache does not have the data, it retrieves the data from main memory, passes the data on to the requestor, and saves the data in case that data is requested again.

Similarly, on a write cycle, the data is stored in the cache quickly and the CPU or device is allowed to continue executing (transferring). This takes much less time than it otherwise would if the CPU or device had to wait for the data to be written to memory.

An implication of this model is that after a device transfer has been completed, the data may still be in the I/O cache but not yet in main memory. If the CPU accesses the memory, it may read the wrong data from the CPU cache. To ensure a consistent view of the memory for the CPU, the driver must call a synchronization routine to write the data from the I/O cache to main memory and update the CPU cache with the new data. Similarly, a synchronization step is required if data modified by the CPU is to be accessed by a device.

There may also be additional caches and buffers in between the device and memory, such as caches associated with bus extenders or bridges. ddi_dma_sync(9F) is provided to synchronize all applicable caches.

`ddi_dma_sync( )`

If a memory object has multiple mappings--such as for a device (through the DMA handle), and for the CPU--and one mapping is used to modify the memory object, the driver needs to call ddi_dma_sync(9F) to ensure that the modification of the memory object is complete before accessing the object through another mapping. ddi_dma_sync(9F) may also inform other mappings of the object that any cached references to the object are now stale. Additionally, ddi_dma_sync(9F) flushes or invalidates stale cache references as necessary.

Generally, the driver has to call ddi_dma_sync(9F) when a DMA transfer completes. The exception to this is that deallocating the DMA resources (ddi_dma_unbind_handle(9F)) does an implicit ddi_dma_sync(9F) on behalf of the driver.

int ddi_dma_sync(ddi_dma_handle_t handle, off_t off,
    size_t length, u_int type);

If the object is going to be read by the DMA engine of the device, the device's view of the object must be synchronized by setting type to DDI_DMA_SYNC_FORDEV. If the DMA engine of the device has written to the memory object, and the object is going to be read by the CPU, the CPU's view of the object must be synchronized by setting type to DDI_DMA_SYNC_FORCPU.

Here is an example of synchronizing a DMA object for the CPU:

if (ddi_dma_sync(xsp->handle, 0, length, DDI_DMA_SYNC_FORCPU)
    == DDI_SUCCESS) {
    /* the CPU can now access the transferred data */
    ...
} else {
    error handling
}

If the only mapping that concerns the driver is one for the kernel (such as memory allocated by ddi_dma_mem_alloc(9F)), the flag DDI_DMA_SYNC_FORKERNEL can be used. This is a hint to the system that if it can synchronize the kernel's view faster than the CPU's view, it can do so; otherwise, it acts the same as DDI_DMA_SYNC_FORCPU.

DMA Windows

The system might be unable to allocate resources for a large object. If this occurs, the transfer must be broken into a series of smaller transfers. The driver can either do this itself, or it can let the system allocate resources for only part of the object, thereby creating a series of DMA windows. Allowing the system to allocate resources is the preferred solution, as the system can manage the resources more effectively than the driver.

A DMA window has attributes offset (from the beginning of the object) and length. After a partial allocation, only a range of length bytes starting at offset has resources allocated for it.

A DMA window is requested by specifying the DDI_DMA_PARTIAL flag as a parameter to ddi_dma_buf_bind_handle(9F) or ddi_dma_addr_bind_handle(9F). Both functions return DDI_DMA_PARTIAL_MAP if a window can be established. However, the system might allocate resources for the entire object (less overhead), in which case DDI_DMA_MAPPED is returned. The driver should check the return value (see Code Example 7-4 on page 150) to determine whether DMA windows are in use.

State Structure

This section adds the following fields to the state structure. See "Software State Structure" on page 63 for more information.

    int partial;          /* DMA object partially mapped, use windows */
    int nwin;             /* number of DMA windows for this object */
    int windex;           /* index of the current active window */

Code Example 7-4 Setting Up DMA Windows

static int
xxstart (caddr_t arg)
{
    struct xxstate *xsp = (struct xxstate *)arg;
    struct device_reg *regp = xsp->reg;
    ddi_dma_cookie_t cookie;
    int status;
    mutex_enter(&xsp->mu);
    if (xsp->busy) {
        /* transfer in progress */
        mutex_exit(&xsp->mu);
        return (0);
    }
    xsp->busy = 1;
    mutex_exit(&xsp->mu);

if (transfer is a read) {

    flags = DDI_DMA_READ;
} else {
    flags = DDI_DMA_WRITE;
}
flags |= DDI_DMA_PARTIAL;
status = ddi_dma_buf_bind_handle(xsp->handle, xsp->bp,
    flags, xxstart, (caddr_t)xsp, &cookie, &ccount);
if (status != DDI_DMA_MAPPED &&
    status != DDI_DMA_PARTIAL_MAP)
        return (0);
if (status == DDI_DMA_PARTIAL_MAP) {
    ddi_dma_numwin(xsp->handle, &xsp->nwin);
    xsp->partial = 1;
    xsp->windex = 0;

        } else {
             xsp->partial = 0;
        }

... program the DMA engine

        ...
        return (1);
}

There are two functions operating with DMA windows. The first, ddi_dma_numwin(9F), returns the number of DMA windows for a particular DMA object. The other function, ddi_dma_getwin(9F), allows repositioning (reallocation of system resources) within the object. It shifts the current window to a new window within the object. Because ddi_dma_getwin(9F) reallocates system resources to the new window, the previous window becomes invalid.

Caution - It is a severe error to call ddi_dma_getwin(9F) before transfers into the current window are complete.

ddi_dma_getwin(9F) is normally called from an interrupt routine; see Code Example 7-5. The first DMA transfer is initiated as a result of a call to the driver. Subsequent transfers are started from the interrupt routine.

The interrupt routine examines the status of the device to determine if the device completed the transfer successfully. If not, normal error recovery occurs. If the transfer was successful, the routine must determine if the logical transfer is complete (the entire transfer specified by the buf(9S) structure) or if this was only one DMA window. If it was only one window, it moves the window with ddi_dma_getwin(9F), retrieves a new cookie, and starts another DMA transfer.

If the logical request has been completed, the interrupt routine checks for pending requests and starts a transfer, if necessary. Otherwise, it returns without invoking another DMA transfer. Code Example 7-5 illustrates the usual flow control.

Code Example 7-5 Interrupt Handler Using DMA Windows

static u_int
xxintr(caddr_t arg)
{
    struct xxstate *xsp = (struct xxstate *)arg;
    uint8_t status, temp;
    mutex_enter(&xsp->mu);
    /* read status */
    status = ddi_get8(xsp->access_hdl, &xsp->regp->csr);
    if (!(status & INTERRUPTING)) {
        mutex_exit(&xsp->mu);
        return (DDI_INTR_UNCLAIMED);
    }
    ddi_put8(xsp->access_hdl,&xsp->regp->csr, CLEAR_INTERRUPT);
    /* for store buffers */
    temp = ddi_get8(xsp->access_hdl, &xsp->regp->csr);
    if (an error occurred during transfer) {
        bioerror(xsp->bp, EIO);
        xsp->partial = 0;
    } else {
        xsp->bp->b_resid -= amount transferred;
    }

    if (xsp->partial && (++xsp->windex < xsp->nwin)) {
        /* device still marked busy to protect state */
        mutex_exit(&xsp->mu);
        (void) ddi_dma_getwin(xsp->handle, xsp->windex,
                 &offset, &len, &cookie, &ccount);

program the DMA engine with the new cookie(s)

        ...
        return (DDI_INTR_CLAIMED);
    }
    ddi_dma_unbind_handle(xsp->handle);
    biodone(xsp->bp);
    xsp->busy = 0;
    xsp->partial = 0;
    mutex_exit(&xsp->mu);

if (pending transfers) {

        (void) xxstart((caddr_t)xsp);
    }
    return (DDI_INTR_CLAIMED);
}

Allocating Private DMA Buffers

Some device drivers may need to allocate memory for DMA transfers to or from a device, in addition to doing transfers requested by user threads and the kernel. Examples of this are setting up shared memory for communication with the device and allocating intermediate transfer buffers. ddi_dma_mem_alloc(9F) is provided for allocating memory for DMA transfers.

int ddi_dma_mem_alloc(ddi_dma_handle_t handle, size_t length,
    ddi_device_acc_attr_t *accattrp, uint_t xfermodes,
    int (*callback)(void *), void *arg, caddr_t *kaddrp,
    size_t *real_length, ddi_acc_handle_t *handlep);

handle is a DMA handle.

length is the length in bytes of the desired allocation.

accattrp is a pointer to a device access attribute structure.

xfermodes are data transfer mode flags.

callback is the address of callback function for handling resource allocation failures. See ddi_dma_alloc_handle(9F).

arg is the argument to pass to the callback function.

kaddrp is a pointer (on a successful return) that contains the address of the allocated storage.

real_length is the length in bytes that was allocated.

handlep is a pointer to a data access handle.

xfermodes should be set to DDI_DMA_CONSISTENT if the device accesses in a nonsequential fashion, or if synchronization steps using ddi_dma_sync(9F) should be as lightweight as possible (because of frequent use on small objects). This type of access is commonly known as consistent access. I/O parameter blocks that are used for communication between a device and the driver are set up this way.

On x86 systems, DDI_DMA_CONSISTENT can be used to allocate memory that is physically contiguous as well as consistent.

Code Example 7-6 shows how to allocate IOPB memory and the necessary DMA resources to access it. DMA resources must still be allocated, and the DDI_DMA_CONSISTENT flag must be passed to the allocation function.

Code Example 7-6 Using ddi_dma_mem_alloc(9F)

if (ddi_dma_mem_alloc(xsp->iopb_handle, size, &accattr,
    DDI_DMA_CONSISTENT, DDI_DMA_SLEEP, NULL, &xsp->iopb_array,
    &real_length, &xsp->acchandle) != DDI_SUCCESS) {
    error handling
    goto failure;
}
if (ddi_dma_addr_bind_handle(xsp->iopb_handle, NULL,
    xsp->iopb_array, real_length,
    DDI_DMA_READ | DDI_DMA_CONSISTENT, DDI_DMA_SLEEP,
    NULL, &cookie, &count) != DDI_DMA_MAPPED) {
    error handling
    ddi_dma_mem_free(&xsp->acchandle);
    goto failure;
}

xfermodes should be set to DDI_DMA_STREAMING if the device is doing sequential, unidirectional, block-sized and block-aligned transfers to or from memory. This type of access is commonly known as streaming access.

For example, if an I/O transfer can be sped up by using an I/O cache, which at a minimum transfers (flushes) one cache line, ddi_dma_mem_alloc(9F) will round the size to a multiple of the cache line to avoid data corruption.

ddi_dma_mem_alloc(9F) returns the actual size of the allocated memory object. Because of padding and alignment requirements the actual size might be larger than the requested size. ddi_dma_addr_bind_handle(9F) requires the actual length.

ddi_dma_mem_free(9F) is used to free the memory allocated by ddi_dma_mem_alloc(9F).

Note - If the memory is not properly aligned, the transfer will succeed but the system will choose a different (and possibly less efficient) transfer mode that requires fewer restrictions. For this reason, ddi_dma_mem_alloc(9F) is preferred over kmem_alloc(9F) when allocating memory for the device to access.

Next Topic

Complete Table of Contents for book