Accessible Document Navigation using AT-SPI

Recommendations for Assistive Technology Developers

Authors and Contributors

Use cases: Catherine Laws, IBM
ATK and AT-SPI recommendations: Bill Haneman

Change History

July 19, 2005: Initial draft - Cathy Laws.
July 26, August 3, 2005: Initial AT-SPI recommendations for AT developers - Bill Haneman..
August 11, 2005: Moved rendering of element characteristics to new section, integration of Bill's AT-SPI recommendations, addition of Table of Contents, new section about roles- Cathy Laws.
Loretta Guarino Reid (Adobe) contributed to the list of desirable document object and landmark roles.
August 18, 2005: CKL: Deleted ATK implementation sections and changed perspective of document to be just for AT developers. ATK ADOC recommendations to be moved to a new document for application developers. Removed the list of scenarios column in the element characteristics table. Other minor related updates.

Table of Contents

Introduction
Terminology
Common Assumptions
Use case: Navigate or Collect All Accessible Content by Unit
Scenario 1: Navigate to the next, previous, current, first, or last item in the document
Scenario 2: Navigate to the next, previous, current, first, or last interactive and enabled element in the document
Scenario 3: Navigate to the next, previous, current, first, or last hyperlink in the document
Scenario 4: Navigate to the next, previous, current, first, or last enabled form control in the document
Scenario 5: Collect all elements with a shortcut key in the document
Scenario 6: Navigate to the next, previous, current, first, or last frame in the document
Scenario 7: Navigate to the next, previous, current, first, or last section identified by a heading in the document
Scenario 8: Navigate to the next, previous, current, first, or last data table in the document
Scenario 9: Navigate to the next, previous, current, first, or last embedded object or graphic in the document
Scenario 10: Navigate table cells by moving across rows and up and down columns within a grid or table
Scenario 11: Navigate to next, previous, current, first, last, up or down in a tree-style set of items or controls
Use case: Where am I
Use case: Document Summary
AT-SPI Interfaces to Obtain Element Characteristics for AT Renderings
Desirable Roles for Document Objects and Navigation

Introduction

Using AT-SPI interfaces, developers of assistive technologies may face difficulties trying to implement logical navigation of the accessible content and structures of complex documents for users with vision, learning, physical, and cognitive impairments. Some of the challenges include:

To show the hierarchy and structure of documents, the application devloper must use the accessibility API to expose to the AT the document elements and their characteristics as they appear in the document hierarchy, not just as a stream of text and associated properties. Without this, contextual information is lost.

To help assistive technology developers provide better navigational user interfaces and hierarchical relationships for complex documents, this paper outlines document navigation use cases with descriptions of recommended AT-SPI interface usage by assistive technology developers. The use cases include navigation and collection of selected document containers and elements, Where am I, and document summary. Additional use cases may be added at a later time to address searching for accessible content, mutation events, and new document events.

Terminology

This section describes terminology used in this document. The terms described in this section are desirable from an accessible document navigation perspective, but there is not yet consensus that all the concepts and components described in the terms are necessary in the AT-SPI interfaces.

Accessible content: All of a document�s elements and their characteristics (descriptions, roles, states, shortcut keys, language, etc) that should be made available by the application through AT-SPI/ATK to the assistive technology. Then the AT can make any of these characteristics available to an end user based on user preferences.

Container: A document consists of one or more containers, which include grids, sections, trees, blocks (list of contiguous interactive or non-interactive elements), embedded objects or documents, and others. The document itself is a container.

Direction: When navigating to an item, element, the top of a container, or within a section, the direction parameter for an accessible document navigation request includes first, last, previous, next, or current. When navigating cells within a grid or table container, the directions include first in table, last in table, right in row, left in row, up one in column, down one in column, rightmost in row, leftmost in row, top in column, bottom in column, span right, span left, span up, span down, header right, header left, header up, header down. When navigating tree elements, the directions include first, last, next, previous, up, or down. Next and previous directions allow navigate to the same level items (like headings), and up and down directions allow the user to change levels. Navigation to the first and last tree items causes the POR to move to the first and last items within the same level. 

Document landmark: A position in a document identified by a specific element attribute. In XHTML 2.0, the role attribute will identify document landmarks, such as main content, secondary content like a portlet, navigation bars, content information like footnotes and copyright statements, advertisement and logo banners, and notes.

Document summary: The document title, language, and number of tables, links, headings, frames, forms, controls, items, pages, and images.

Element: An information unit in a document for which each character in the unit has the same set of characteristics.

Element characteristics: Text and semantic characteristics of an element in a document that are derived from attributes usually specified by the document author or inherent in the object selected by the author, such as an element�s alternative description, access or shortcut key, role, table summary value, a heading level, a link (URL) value, an image source filename, language, text styles, or an accessibility API defined state. 

Element location: Type of parent container(s), location of element within a container(s), container title(s) and position(s) relative to the whole document, section heading, element position relative to the whole document, and document title.

Enabled elements: An enabled element is a piece of content, usually an interactive element, with associated behaviors that can be activated through the user interface or through a programming interface. Enabled elements may be temporarily disabled programmatically (i.e. no user input allowed).

Event handler: A script which is executed when an event (such as a mouse over, key press, mouse click, etc) of a given type occurs. An event handler is associated with a document element through document markup.

Frame: A container in a document which allow authors to present the document in multiple views, offering a way to keep certain information visible, while scrolling or replacing other information in the views.

Grid: A container that has rows and columns, such as a table, spreadsheet, or calendar.

Information unit: A navigation unit for the accessible document interface. It can be a container type, an item, heading levels, an interactive element, or a block of interactive or non-interactive elements. Container types for navigation and collection include well defined sections (frames, divisions, spans, pages, slides, tabbed page or sheet), sections identified by document landmarks (such as headings, chapters, a role attribute in XHTML), and specialized sections or containers (forms, embedded objects, lists, header, footer, footnotes, table of contents, notes, annotations, index, bibliography). The subset of interactive elements for navigation includes form controls, bookmarks, elements with event handlers, and elements with access keys.

Interactive elements: Has associated behaviors to be executed or carried out as a result of user or programmatic interaction. For example, the interactive elements of HTML include text and image links, image map areas, form elements, elements with access keys, DHTML widgets, and elements with event handlers.Interactive elements may be temporarily disabled programmatically (i.e. no user input allowed).

Item: Content in a document between each hard line break. Each character in the document is contained within one and only one item. For example, paragraphs, list items, table cells, images, map areas, and headings are items. Links and controls are usually not an item on their own but a part of an item unless a line break is forced.

Line: A specified number of characters, which may be the visual width of the application�s edit/container control, an AT text view, the number of characters on a Braille display, or something else.

Listener interface: A programming interface, implemented by the assistive technology, to which the application accessibility interface asynchronously passes sets of information in response to accessible document events or requests from the assistive technology. The assistive technology monitors the listener interface. When the AT sees information waiting to be processed, it requests to receive that information from the listener interface.

Order: The navigation sequence of information units. If the information unit type is an interactive element or has a taborder characteristic like a tabindex, the order could be tab order versus document order.

Point of regard (POR): Within a document, a position that references an object or element, plus a character offset to reference a more specific character or word within the object. The current POR refers to the position in a document where an AT is currently retrieving element characteristics or location information which it intends to use to create an output rendering for the end user. The input focus position in a document does not necessarily match the current POR. An application's accessibility interface and the assistive technology must share a common definition for the POR. The assistive technology should manage the POR through a POR controller and may provide the application accessibility interface access to the POR controller for updating the POR. However, the accessibility interface may choose to communicate with the AT about POR values using interface parameters and return values.

POR Controller: A programming interface, implemented by the assistive technology, in which the current point of regard is updated and maintained. The assistive technology should expose the POR controller interface to the application, which could update the POR in response to accessible document requests and events. The assistive technology should update the POR based on output progress.

Scope: A range of accessible content within a document which can limit accessible document navigation or searching. The scope is either the whole document or a specified container within the document.

Section: A container in a document that clearly separates its content from other content in the document. Examples include page, frame, form, map, navigation bar, block (list of interactive or non-interactive elements), object, content between headings, group boxes, tabbed pages, division, span, document landmarks identified by a role attribute or namespace, slide, sheet, note section, header, footer, table of contents, index, and others.

Streaming: When a navigation or search request will result in the return of multiple sets of information (like element characteristics and location information for the whole document or a collection of one type of element in the document), the AT receives and processes one set of information at a time by formatting and sending text output to the output devices before receiving all the sets of information from the application accessibility API listener interface. This technique provides the perception of a faster response time for the end user.

Tree: Container or set of elements that consists of a more than one level of lists, such as folders plus files, menus, heading levels, and lists with embedded lists.

Common Assumptions

Some assumptions are common to all accessible document use cases. Unique assumptions are stated in the individual use case descriptions.

  1. The AT should implement a POR controller and a listener interface that the application accessibility interface can update with new PORs, element charateristics, and event notifications. The application returns requested information and event notifications through a listener interface rather than directly. This allows the application and AT to use streaming techniques to process returned information, which will improve the user�s perception of response times (performance). (Note: There is not yet consensus and agreement regarding this assumption.)
  2. Activation of interactive and enabled elements should be handled through the application. This includes default actions (onclick which may change the state or value of a control, move focus to a text field, or follow a link), activation of a link or control that matches an access or shortcut key, submitting a form, or triggering an event handler like a mouse over event or a keypress independent of a required input device. The AT may handle activation of interactive and enabled elements when they provide alternative input mechanisms (Braille input, single switch, speech recognition, sometimes keyboard, etc) not provided by the application. Applications usually provide keyboard mechanisms for performing default actions (such as the onclick action) for controls and links, but not always for other event handlers, such as a mouse over action or a key press action. For these cases, the AT may provide the keyboard interface. Optimally, the application should provide access to non-default actions from a context menu or through some other UI.
  3. When the client application provides word, character, and line navigation and selection, the AT should rely on the application's accessibility APIs and definition of a word, character, and line. A line definition may be based on a specified number of characters, which may be the visual width of the application�s edit/container control or the visual width of a column within a single element. Word and character definitions must include internationalization considerations. When word, character, or line navigation is not properly provided by the application, or when the AT wants to provide this type of navigation and selection for its "text or spoken view" of the document, the AT may determine its own definitions of word, character, and line. The line definition may be based on the visual width of an AT text view, the number of characters on a Braille display, or some other line definition based on the text view displayed on the desired output device.
  4. The application should maintain end user preferences and features related to visual presentation and events that can benefit everyone, such as a zoom feature, meta refresh rates, style sheets and fonts, the display of images, etc.
  5. The actual accessible content rendered by the AT through some output device may contain a subset of the element characteristics returned by the application's accessibility interface, determined in part by AT end user accessible document preferences. So the AT may only request the desired subset of element characteristics from the accessibility interface in order to achieve optimal performance. End user preferences for accessible document announcements may include whether or not and how to announce headings and their levels, visited links, links, list items, event handlers, table cell information, row headers, column headers, no or empty image descriptions, table summaries, abbreviations and acronyms, and access and shortcut keys.
  6. Selection of accessible content should be handled by the AT UI based on a visible or invisible "text view," and selection of the default rendering of the document (like the real Web page) will be handled by the application UI.
  7. Data tables in HTML documents are distinguished from layout tables by the presence of a CAPTION, TH cells, header attributes, a role=�presentation�, and/or a summary attribute.
  8. If collecting all of one type of information unit, it is for the purpose of listening to, seeing, printing, selecting, or saving a view or list of all the requested accessible content for a specified information unit type for the document or for a specified container in the document.

Use case: Navigate or Collect All Accessible Content by Unit

Description:
Navigate or collect all accessible content of the whole document or a container in a document (scope) by a specified information unit type (container, item, interactive or non-interactive element, or block of interactive or non-interactive elements), in a specified direction and order, starting from the current point of regard (POR). If collecting all of an information type in the whole document or container, the collection starts from the beginning of either the whole document or the specified container (scope), with or without updating the current POR to the beginning of the document or container. The AT should receive through a listener interface a specified subset of element characteristics for each element in the requested information units. A listener interface and streaming techniques implemented in the application ATK interfaces will help the AT address more easily performance issues when trying to obtain collections from large documents instead of trying to make multiple and different types of AT-SPI calls or limiting collections to the visible area of the document to manage performance.

Pre-conditions:
If requested scope is not the whole document, the point of regard must be within the type of container you want to navigate.

Failed end conditions:

  1. The document or container is empty.
  2. Requested information unit not in the document or container.
  3. No next information unit � bottom of document or container.
  4. No previous information unit � top of document or container.
  5. Point of regard is not in the requested container.
  6. Requested information type not valid for requested container. (e.g. cell is only allowed for a grid container)
  7. Requested direction not valid for requested container and information type. (e.g. grid directions not allowed for non-grid container, some tree directions not allowed for non-tree containers)
  8. Top of grid (table).
  9. Bottom of grid (table).
  10. Right edge of grid (table).
  11. Left edge of grid (table).

Scenario 1: Navigate to the next, previous, current, first, or last item in the document

Main scenario description:
An item is content in a document between each hard line break. Each character in the document is contained within one and only one item. For example, paragraphs, list items, table cells, images, map areas, and headings are items. Links and controls are usually not an item on their own but a part of an item unless a line break is forced.This scenario allows an end user to navigate sequentially through all the different elements in a document in document order and "see, hear, or feel" their content and element characteristics, regardless of the element type. This navigation includes form controls which are read-only or disabled.

Alternate scenarios:

  1. Collect all the items and their related characteristics in the document for the purpose of listening to, seeing, printing, selecting, or saving a text view of all the accessible content in the document. The starting POR could be the current POR or the beginning of the document.
  2. Instead of navigating to or collecting all the items in the whole document, the scope could be limited to a specified container or section in the document.

Successful end conditions:
Refer to the section AT-SPI interfaces to obtain element characteristics for AT renderings for each element in each item.

General use of AT-SPI interfaces by ATs:

Using AT-SPI version 1.6:

  1. From the current or specified POR, walk up the accessible object hierarchy (e.g. the document tree) via Accessible::getParent. Some heuristics are required in order to identify the document content boundaries using AT-SPI 1.6. For example, document searches are normally bounded by containers of Role, such as ROLE_HTML_CONTAINER, ROLE_CANVAS, ROLE_VIEWPORT, and a few other roles. The AT may choose to bound document traversal by other 'containership' roles such as ROLE_TABLE, ROLE_TREE, ROLE_LIST, etc.
  2. Performance is not a particular concern in the primary use case, since at-spi provides 'random access' to children within a container. Determining the first 'presentable item' may require further descending into containers. Once the bounding container is identified, calls to Accessible::getChildAtIndex(0) should be made until an appropriate presentation object is found. An 'appropriate presentation object' is identifiable by the presence of the Text, Image, or Action interface, or the presence of certain known Roles such as ROLE_ LABEL. If an object which implements Table is encountered, additional heuristics may be needed in order to identify a suitable "first presentation item" (typically the table cell at row 0, column 0). The last item in a container may similarly be identified by traversing down the containers using Accessible::getChildAtIndex (Accessible::getChildCount() - 1). In the event that such traversal ends at a 'leaf node' which has no suitable content (such as an object of ROLE_FILLER), next/prev heuristics should be applied until an appropriate content item for user presentation is encountered.
  3. For "next" and "previous" navigation, the object whose content is associated with the current POR should be checked for the FLOWS_FROM/FLOWS_TO relation; if one exists, the object pointed to by the relevant relation is the "next" or "previous" object and the POR should be changed accordingly. Otherwise, "next" and "previous" items are located as follows. For 'next', if the current POR resides on an object which is the not the last in its container (known either from previous traversal or from Accessible::getIndexInParent/Accessible::getChildCount), the 'next' item is found via <parent>::getChildCount(<child>::getIndexInParent()+1). This object is tested against the criteria in "b" above for user-presentable (i.e. non-empty) content. If there is no suitable sibling item in the hierarchy, the tree is traversed via the parent to the 'next' nodes in the tree.
  4. POR is handled by a combination of listening for events from the application which implicitly or explicitly change the POR, and optionally by defining POR in terms of the item (element or elements) currently being presented to the user by the AT. If the POR changes while iterating over a document, iteration may be immediately resumed from the new POR.
  5. A weakness of the above algorithm for finding the 'last' content in a document is that in some cases the 'last' item according to content flow will not be 'last' in the canonical hierarchy. In the event that this is the case, the canonically-last item will have a FLOWS_FROM relation, which needs to be followed serially until the end of the content flow is reached. This may have a user-visible performance impact in isolated situations.
  6. Note that objects should normally be tested for the presence of STATE_VISIBLE and STATE_SHOWING before presentation to the end-user, otherwise the user may be presented data that is not exposed to the sighted user and therefore may be invalid, out of date, or irrelevant. STATE_VISIBLE means that it is not hidden content but it may or may not be currently displayed on the screen. STATE_SHOWING means that it is in the part of the document that is currently being displayed on the screen.

Using AT-SPI with the proposed Collection, Document, etc. enhancements:

  1. Using the Collection, Document and extended Accessible API yields considerable advantages in this scenario. Document-wide searches are immediately bounded by Accessible_getDocument(), which returns the containing object for a given document's content. Since this element is required to implement Collection as well, first/last/next/prev can be obtained with only one additional API call. If the POR changes, even an 'arbitrary' change in POR (or a change in situations in which the AT does not maintain a cache) can be accommodated without the need to call Accessible::getDocument again, since Collection::isAncestorOf (por) may be used to determine whether the POR has moved outside the current document. The first and last matching objects in a document may be obtained via calls to Collection::getChildren, specifying a list-length of 1, and requesting the appropriate sort order (or 'reverse' sort order). Similarly, the next and previous matching objects in a document may be obtained via Collection::getNextChildren and Collection::getPreviousChildren.
  2. To retrieve each and every presentable and non-empty object in a document, a sufficient match rule would be to leave all match parameters empty, other than MatchRule. As long as MatchRule is not "MATCH_EMPTY", then any element in a Collection will match.
  3. Significantly, the extended APIs allow navigation in 'flow order' or 'tab order' without additional client-side api calls. Also, the Collection api allows the client to specify matching element characteristics in the api call, so the client need not do its own filtering. This also eliminated the need to 'walk the tree' or visit nodes which are not of current interest to the client.
  4. While this already will have a major performance impact, by reducing api calls manyfold, if the client anticipates the need to continue iterating through a document using the same match criteria, additional performance benefits are expected by requesting a longer return list; this prevents each navigation iteration requiring separate roundtrips. In practice, to reduce user-visible latency, an initial call to retrieve a single element, followed by a second request for a longer list while streaming the presentation info, will often be the optimum choice. In some situations the AT may choose to bound the presentation of content to the currently visible section; this may be accomplished by including STATE_VISIBLE and STATE_SHOWING among the 'State' criteria in the getChildren/getNextChildren/getPreviousChildren call.
  5. Once the element or elements to be presented have been obtained, additional API calls must be made to determine the type of element (i.e. hyperlink, control, text, image, table, etc using interface implement, role, state, etc calls) and the unique attributes which will be presented to the user or otherwise serialized. Since this step happens serially in "user time", performance is not an issue here. Refer to the section AT-SPI interfaces to obtain element characteristics for AT renderings to determine which calls should be made for each type of element in each item.
  6. As in the 1.6 AT-SPI case, POR is managed according to a policy set by the AT, based on POR-relevant at-spi events from the application (e.g. focus, object:active-descendant-changed, text:caret-moved, component:bounds-changed, window:activate, object:selection-changed) and on the element or elements currently being presented to the end user by the AT. For instance, a document navigation feature in an AT may define the POR to be the text currently being spoken or brailled, unless/until a caret-moved or focus event from the application is received. (Such POR-related events from at-spi are almost always a consequence of end-user interaction with an input device, and therefore arguably should take precedence over the "current progress" of the AT's presentation system). In the event that the POR changes due to user input while iterating over an AccessibleSet, either the object associated with the POR may be compared with the AccessibleSet members, or the call to Collection::getNextChildren may be re-issued, with trivial performance impact.

Main scenario: Navigate by item

Using AT-SPI 1.6:

Using AT-SPI with Collection/Doc enhancements:

Alternate scenario 1: Collect all items

AT-SPI version 1.6:

  1. Retrieving all the user-presentable content in a document is similar to the Main scenario, but is in fact simpler because context and hierarchical information may be retained while the document is traversed. This context may be leveraged to make the collation of content more efficient that brute force application of the 'first/next' operations listed above in the 'Main scenario'. Note also that in addition to the FLOWS_FROM and FLOWS_TO relations, the EMBEDS relation may indicate when a particular piece of content is 'embedded' in another, for instance when an image lies 'inside' a text block.
  2. If the presentation of content takes place in 'user time', performance is a secondary consideration. Streaming of data is advisable however. In some cases AT may choose to restrict the presentation to the visible portion of the document. In this case, the POR may be programmatically changed by changing the text caret, adjusting a scrollbar (Value::setValue), or activating a user interface element which advances the document viewport (Action::doAction).
  3. The situations in which performance issues may be visible to the end-user are primarly those involving AT-side content search, and serialization/conversion of data. Content search is better achieved via the user agent. If the information to be serialized or searched is the "AT's presentation view" of the data, as opposed to the content itself, then only the AT can achieve the desired results, as only the AT can know its presentation model. If the content itself is being serialized, stored, or converted, then the StreamableContent interface is preferred. Content search provision is normally the domain of the user agent.
  4. Clients should be careful about enumerating the children of objects with STATE_MANAGES_DESCENDANTS; this is often used as an indicator that the container is either unbounded or very large. In such cases, techniques using the Component::getBounds and Component::getAccessibleAtPoint apis to establish the visible viewport should be used. This is particularly important for tables.

Using AT-SPI with the Collection and Document enhancements:

  1. Using Collection, the user presentable content may be retrieved in a single API call, serialized according to document content flow. In practice, an AT may wish to make two small adjustments to this strategy; rather than retrieving all content, the AT may specify the length of returned lists and therefore handle the streamed data in 'chunks' for performance reasons. Also, the AT may wish to handle certain types of content (for instance, data tables) differently, and therefore may request multiple lists of content which it them interleaves according to its own rules. Such independent lists of content may be collated with one another by use of the getNextChildren and getPreviousChildren APIs, for instance to determine where, in a sequence of text blocks, to present a Table.
  2. Lastly, note that for many purposes the actual content itself may be transported from the application to the AT, via the StreamableContent interface. This is primarily useful for saving, printing, or post-processing entire documents, or for persisting alternate views of a document.

Alternate scenario 2: Navigate to or collect all items within a limited scope

Using AT-SPI version 1.6:
The method is the same as for the entire document, except that a different boundary condition is used for the traversal.

Using AT-SPI with the Collection and Document enhancements:
This can be trivially done if the relevant containers implement Collection. If they do not, Collection::getNextChildren() may be called using the bounding container as the POR, and a traversal of the returned list may be truncated when the list items no longer fall within the bounding container. This must be established via calls to Accessible::getParent at present.

Scenario 2: Navigate to the next, previous, current, first, or last interactive and enabled element in the document

Main scenario description:
Interactive elements include text and image links, image map areas, form elements, elements with access or shortcut keys, DHTML widgets like menus and calendars and spreadsheets, and elements with event handlers. This scenario allows an end user to navigate sequentially through all the interactive elements and their characteristics in a document in document order. This navigation does not include form controls which are read-only or disabled.

Note: User agents (such as a browser) usually provide navigation (i.e. the Tab key) in tab order to the combined collection of form controls and hyperlinks, but not usually to just hyperlinks, just form controls, or just elements with access keys. Also, the user agent may not provide keyboard navigation to elements that are not normally interactive but are made interactive programmatically, such as elements in a Web page with JavaScript event handlers. An AT may want to provide navigation to or lists for the individual types of interactive elements (text links, image links, form controls, elements with access keys, elements with event handlers, etc) as well as for the combined collection in both document order and tab order. See other scenarios below for navigation to individual types of interactive elements.

Alternate scenarios:

  1. Collect all the interactive and enabled elements and their characteristics in the document for the purpose of creating a list.
  2. The AT may want to specify tab order instead of document order for navigation or for the list.

Successful end conditions:
Refer to the section AT-SPI interfaces to obtain element characteristics for AT renderings for each interactive element.

Use of AT-SPI interfaces by ATs:

Using AT-SPI 1.6:

  1. The presence of interactive elements is signalled by the Action interface. In addition, the Hypertext interface indicates text and image links and map areas. Iterating over the document and collecting the objects which implement Action or Hypertext is the best way to identify interactive elements. Experience has shown that iterating through the children of Action-implementors is also useful, as some implementors of Action also have actionable children (which, themselves, implement Action). Objects implementing Action which are currently enabled may be identified by the presence of STATE_SENSITIVE.
  2. AT-SPI 1.6 cannot programmatically identify Tab order. In many use cases this is unimportant, since the user will be using Tab directly to interact with an interface - that is, the interface itself provides the user-traversable list of controls. However, if it is important to present the user with a list of elements without user interaction with the keyboard, TAB order can only be inferred from 'canonical' index order.
  3. For performance reasons, and for reasons of consistency with the visual content presentation, the iteration may be restricted to the visible area. In practice this means identifying objects with viewports (for instance, objects with interface Table, or with ROLE_HTML_CONTAINER or ROLE_VIEWPORT) and restricting search to the visible range determined via Component::getBounds and Component::getAccessibleAtPoint.

Using AT-SPI with Collection and Document enhancements:

  1. The Collection::getChildren API should be used with a repoid "IDL:Accessibility/Action:1.0" and a StateSet including STATE_SENSITIVE, STATE_VISIBLE, and STATE_SHOWING in order to retrieve the actionable elements. In some cases STATE_VISIBLE and STATE_SHOWING may be omitted, but only if STATE_MANAGES_DESCENDANTS is excluded from the list of valid states. The sort order should be SORT_ORDER_FLOW. Hyperlink elements are identified by the Hypertext interface.
  2. Because searching, filtering, and sorting takes place without IPC roundtrips, performance concerns should be very minor; if there are any, the returned list may be limited in length to 100 or so via the 'count' parameter to Collection::getChildren().

For additional navigation techniques and recommendations, see the section on Desirable Roles for Document Objects and Navigation.

Main and Alternate 1 scenario: Navigate to all interactive and enabled elements (controls, links, elements with event handlers, etc) in document order

Using AT-SPI 1.6:
The main scenario is achievable without AT intervention if the user agent provides basic keyboard navigation of the interactive elements.
The AT can achieve brute-force 'navigation' of the controls as follows: iterate through the document, possibly pruning nodes that are invisible. Collect the list of objects implementing 'Action' (Accessible::isAction()), whose StateSet includes SENSITIVE, i.e Accessible::getStateSet()::compare(sensitive_state) doesn't return an empty StateSet.

Using AT-SPI with enhancements:
From POR, find Document via Accessible::getDocument. Call Collection::getChildren (), requesting SENSITIVE implementors of "Accessibility/Action". This list can be the basis of end-user-visible iteration. To navigate with respect to some already-determined POR, the Collection::getNextChildren api can be used instead. The length of the returned list of actionable items may be controlled by the AT, i.e. it may be 1 or 1000. Making it small on the initial call, when AT navigation begins, may reduce user-visible latency.

Alternate scenario 2: Navigate to or collect all interactive and enabled elements in tab order

Using AT-SPI 1.6:
Tab order is known only to the user agent.

Using AT-SPI with Collections enhancements:
Specify SORT_ORDER_TAB in place of SORT_ORDER_FLOW in the calls to Collection:: apis.

Scenario 3: Navigate to the next, previous, current, first, or last hyperlink in the document

Main scenario description:
Hyperlink elements include text and image links and image map areas. This scenario allows an end user to navigate sequentially through all the hyperlink elements and their characteristics in a document in document order.

Alternate scenarios:

  1. Collect all the hyperlink elements in the document for the purpose of creating a list.
  2. The AT may want to specify tab order instead of document order for navigation or for the list.

Successful end conditions:
Refer to the section AT-SPI interfaces to obtain element characteristics for AT renderings for each hyperlink.

Use of AT-SPI interfaces by ATs:

  1. Hyperlink elements are identified by the Hypertext interface. Hyperlinks are retrieved from objects implementing Hypertext, and are associated with a particular range of character offsets in the relevant Text object. Hyperlinks can be further queried and interacted with by requesting their URI and/or querying them for a backing object (which, for instance, may implement Action). Client-side image maps are examples of Hyperlinks that have multiple 'anchors'; each anchor has a URI and backing object, and in the case of client-side image maps, the backing objects should implement Image and Action.
  2. Within an object implementing Hypertext there may be multiple links. These are addressable in index order, with 0 being the first hyperlink in a text block, etc.
  3. The task of navigating from link to link is normally under the control of the user agent, in which case the navigation scenario becomes trivial (listening to "focus" events and checking state changes on the relevant text objects as the user POR moves from link to link).

Main scenario: Navigate to all hyperlinks in document order

Using AT-SPI 1.6:
From the current POR, or start of document, iterate through content until the first/next object implementing the Hypertext interface is found. It should then present the hyperlink object as above, or, in some situations, may present other information such as the URI of one or more of the link's "anchors".

Using AT-SPI enhancements:
As above, but Collection::getNextChildren (... "IDL:Accessibility/Hypertext:1.0", ..., SORT_ORDER_FLOW ) may be used to avoid iteration through the document. When querying Image hyperlinks, the Image::getLocale api should be used to determine the language to be used when presenting the image description.

Alternate scenario 1: Collect a list of all hyperlinks in document order
Trivially the same as the main scenario. If the AT-SPI Collection enhancements are not present, streaming and preloading may be used to reduce end-user-visible latency in compiling the list, i.e. the first element of the list should be obtained and presented before traversing the entire document.

Alternate scenario 2: Navigate to or collect a list of all hyperlinks in tab order

Using AT-SPI 1.6
Tab order is unknown (However if the user agent implements keyboard navigation, it is unnecessary to compile the list in order to traverse the links in tab order).

Using AT-SPI Collection interface
SORT_ORDER_TAB may be substituted for SORT_ORDER_FLOW.

Scenario 4: Navigate to the next, previous, current, first, or last enabled form control in the document

Main scenario description:
Form controls include radio buttons, check boxes, text fields and areas, password fields, buttons, select menu and options, combo boxes, sliders, and other custom controls, such as DHTML widgets (menus, calendars, spreadsheets, etc).. This scenario allows an end user to navigate sequentially through all the form control elements and their characteristics in a document in document order. This navigation does not include form controls which are read-only or disabled.

Alternate scenarios:

  1. Collect all the form controls in the document for the purpose of creating a list.
  2. The AT may want to specify tab order instead of document order for navigation or for the list.

Successful end conditions:
Refer to the section AT-SPI interfaces to obtain element characteristics for AT renderings for each form control.

Use of AT-SPI interfaces by ATs:

Main and Alternate 1 scenario: Navigate to or collect all enabled form controls in document order

Using AT-SPI 1.6:
Iterate through the document, possibly pruning nodes that are invisible. Collect the list of objects implementing 'Action' (Accessible::isAction()), whose StateSet includes SENSITIVE, i.e Accessible::getStateSet()::compare(sensitive_state) doesn't return an empty StateSet.

Using AT-SPI with enhancements:
From POR, find Document via Accessible::getDocument. Call Collection::getChildren (), requesting SENSITIVE implementors of "Accessibility/Action". This list can be the basis of end-user-visible iteration. To navigate with respect to some already-determined POR, the Collection::getNextChildren api can be used instead. The length of the returned list of actionable items may be controlled by the AT, i.e. it may be 1 or 1000. Making it small on the initial call, when AT navigation begins, may reduce user-visible latency.

For additional navigation techniques and recommendations, see the section on Desirable Roles for Document Objects and Navigation.

Alternate scenario 2: Navigate to and collect a list of all enabled form controls in tab order

Using AT-SPI 1.6:
Tab order is known only to the user agent.

Using AT-SPI with Collections enhancements:
Specify SORT_ORDER_TAB in place of SORT_ORDER_FLOW in the calls to Collection:: apis.

Scenario 5: Collect all elements with a shortcut key in the document

Main scenario description:
In HTML documents, elements with a shortcut key have an accesskey attribute. For this scenario, the AT may create a list of all elements (usually interactive elements) in a document with a shortcut or access key and display the list in alphanumeric order with the key listed as the first character in each list item.

Successful end conditions:
Refer to the section AT-SPI interfaces to obtain element characteristics for AT renderings for each element with a shortcut key.

Use of AT-SPI interfaces by ATs:

Elements with shortcut keys are by definition actionable (i.e. the Action interface is implemented). Proceed as in Scenario #2, then prune objects whose keyboard shortcuts are empty (Action::getKeybinding()).

Scenario 6: Navigate to the next, previous, current, first, or last frame in the document

Main scenario description:
This scenario allows an end user to navigate through all the frames in an HTML frameset.

Alternate scenarios:

  1. Collect all the frame elements in the frameset document for the purpose of creating a list of frames.

Successful end conditions:
Refer to the section AT-SPI interfaces to obtain element characteristics for AT renderings for each frame.

Use of AT-SPI interfaces by ATs:

Recommend interfaces for navigation and collection of frame elements in a frameset, plus interfaces for getting the required element characteristics for the current one or a collection of frame elements. Recommend how to handle point of regard (POR) and performance (by limiting scope to the visible area, streaming, etc).


Main scenario: Navigate by frame

Alternate scenario 1: Collect a list of frames in a frameset

Scenario 7: Navigate to the next, previous, current, first, or last section identified by a heading in the document

Main scenario description:
A heading in an HTML document is identified by an h1, h2, ..., h6 element. In office documents, different heading levels are identified by a certain set of style attributes. For this scenario, the AT may provide sequential navigation to all the sections in a document with a heading.

Alternate scenarios:

  1. Collect all the headings in the document for the purpose of creating a list of headings.
  2. Navigate the heading levels as a tree structure by navigating to the first, last, next, previous, up, or down. Next and previous directions allow navigation to the same level items (like headings), and up and down directions allow the user to change levels. First and last causes the POR to move to the first and last items within the same level.
  3. (Future XHTML) Navigate to or collect other section document landmarks identified in XHTML using the role attribute.

Successful end conditions:
Refer to the section AT-SPI interfaces to obtain element characteristics for AT renderings for each heading.

Use of AT-SPI interfaces by ATs:

  1. Headings should always implement Text (even if they are empty, save graphical information).
  2. Headings are identified via Text attributes, i.e. Text::getAttributes (0, ...). Relevant attributes (i.e. <h1>, etc.) should be present on the first character. However, a paranoid client may detect pathological cases by detecting the presence of heterogeneous attributes in the text (e.g. if the first attribute run does not encompass the entire character count), and iterating through the attribute ranges.
  3. If an object with appropriate 'heading' attributes also implements Image, or has an EMBEDS relation, the image description and/or information about the embedded object should be reported along with the explicit content of the object's Text interface (if non-empty).
  4. Since Text attributes may be any name-value pair, it is possible to express and identify a document landmark of any type in this way. It is permissible for the value of a name-value pair to be empty, for instance the presence of the 'h1' attribute might be indicated as text attribute "h1:". Alternatively, see the section on Desirable Roles for Document Objects and Navigation.
  5. In text content, the POR should be determined by the focused text element, and the caret offset within it. The caret offset may be queried programmatically (via the Text::caretOffset attribute), or by listening for "caret-moved" events.
  6. The locale of a text object is the result of the following cascade: Application locale (Application::getLocale, where Application may be obtained via Accessible::getApplication whenever window focus changed), document locale (Document::getLocale), and text attributes named "lang" or "locale". If the language or locale of document text changes within the document, the appropriate text attributes should bracket the locale change.


Main scenario: Navigate to all sections in a document identified by a heading

Using AT-SPI 1.6:
Traverse next/previous/first/last as in Scenario 1 (navigate by item), but test for the presence of the Text interface and an appropriate text attribute via Text::getAttributes at offset 0. Traversals should be careful of FLOWS_FROM and FLOWS_TO relations.

Using AT-SPI Collection and Document enhancements:
From the POR, determine the containing Document via Accessible::getDocument. Retrieve a list of headings by constructing a match set of text attributes, and calling Collection::getNextChildren(...) with the specified attribute set and match rule MATCH_ANY. The specified length of the returned list may be '1'. The sort order should be SORT_ORDER_FLOW.

Alternate scenario 1: Collect a list of all headings in a document
As above, but if Collection is available, the returned list length may be as long as the AT wishes, subject to latency considerations.

Alternate scenario 2: Navigate heading levels a a tree structure
'Level' should be determined by either the use of multiple queries, or by querying the heading attributes of the elements in the returned list (which need to be presented to the end-user in any case). Depending on how the AT wishes to manage its IPC transport, it may wish to construct a local hierarchy, tag the list of document landmarks, or maintain multiple lists in order to efficiently present such structured navigation. At the ATs discretion, the application-side POR may be programmatically moved via Component::grabFocus and Text::setCaretPosition (bearing in mind that applications have the right to return FALSE to indicate that the request could not be honored).

Alternate scenario 3: Navigate to or collect other document landmarks with a role attribute
See the section on Desirable Roles for Document Objects and Navigation.

Scenario 8: Navigate to the next, previous, current, first, or last data table in the document

Main scenario description:
For this scenario, the AT may provide navigation to the tops of all the data tables in the document.

Successful end conditions:
Refer to the section AT-SPI interfaces to obtain element characteristics for AT renderings for each table top.

Use of AT-SPI interfaces by ATs:

Recommend interfaces for navigation and collection of table elements, plus interfaces for getting the required element characteristics for the current one or a collection of table elements. Recommend how to handle point of regard (POR) and performance (by limiting scope to the visible area, streaming, etc).

Also, see the section on Desirable Roles for Document Objects and Navigation.

Scenario 9: Navigate to the next, previous, current, first, or last embedded object or graphic in the document

Main scenario description:
This scenario allows an end user to navigate sequentially to all the different embedded objects in a document, including images, pictures, diagrams, Flash content, Java applets, graphs, spreadsheets, and other document content imported from a different source file and/or document types. This navigation does not include navigation within the embedded objects.

Alternate scenarios:

  1. Collect all the embedded objects and their related characteristics in the document for the purpose of creating a list of embedded objects in the document.
  2. Navigate to or collect a list of embedded objects of a specific file or document type, such as a list of just images.
  3. After navigation to an embedded object, move keyboard focus into the embedded object. When done navigating that object, move focus back to the original document.

Successful end conditions:
Refer to the section AT-SPI interfaces to obtain element characteristics for AT renderings for each embedded object or graphic (image).

Use of AT-SPI interfaces by ATs:

Recommend interfaces for navigation, collection, and moving focus in and out of embedded objects or graphics, plus interfaces for getting the required element characteristics for the current one or a collection of embedded objects or graphics. Recommend how to handle point of regard (POR) and performance (by limiting scope to the visible area, streaming, etc).

Also, see the section on Desirable Roles for Document Objects and Navigation.


Main scenario: Navigate to all types of embedded objects (and images) in the document

Alternate scenario 1: Collect a list of all types of embedded objects

Alternate scenario 2: Navigate to or collect a list of one type of embedded object

Alternate scenario 3: Move keyboard focus in and out of an embedded object

Scenario 10: Navigate table cells by moving across rows and up and down columns within a grid or table

Main scenario description:

If the POR is within a data table or a grid like a spreadsheet or a calendar, the AT may provide navigation of table cells by allowing the user to move up or down one table cell in a column, or left or right one table cell in a row.

Alternate scenarios:

  1. The AT may allow the user to move to the first or last cell in the table.
  2. The AT may allow the user to move to first or last cell in the row, or the top or bottom cell in a column.
  3. If some of the table cells span more than one row or column, the AT may allow the user to move left or right one spanned cell at a time in a row, or up or down one spanned cell at a time in a column.
  4. The AT may allow the user to read the top or bottom cell in the column or the first or last cell in a row, plus navigate from that cell, without moving the current point of regard. This scenario allows the user to read "header" and "footer" cells without losing their current reading position within a table .

Successful end conditions:
Refer to the section AT-SPI interfaces to obtain element characteristics for AT renderings for each element in each table cell.

Use of AT-SPI interfaces by ATs:

Using AT-SPI 1.6:

The AT should bound document traversal using the accessible Table interface.

Recommend interfaces for navigation of table cells within a table or grid, plus interfaces for getting the required element characteristics for a table cell. Recommend how to handle point of regard (POR) and performance (by limiting scope to the visible area, streaming, etc).

Also, see the section on Desirable Roles for Document Objects and Navigation.


Main scenario: Navigate table cells, up and down columns and across rows

Alternate scenario 1: Move to first or last cell in a table

Using AT-SPI 1.6:
If an object which implements Table is encountered, additional heuristics may be needed in order to identify a suitable "first presentation item" (typically the table cell at row 0, column 0). The last item in a container may similarly be identified by traversing down the containers using Accessible::getChildAtIndex (Accessible::getChildCount() - 1).

Alternate scenario 2: Move to first or last cell in a row, top or bottom cell in a column

Alternate scenario 3: Navigate by spanned cell across a row, or up and down a column

Alternate scenario 4: Read header or footer cells without moving (without losing current reading position)

Scenario 11: Navigate to next, previous, current, first, last, up or down in a tree-style set of items or controls

Main scenario description:
For this scenario, the AT may provide tree-style navigation for a list with embedded lists of items, menu widgets, or folder-file type lists. Next and previous directions allow navigation to the same level items, and up and down directions allow the user to change levels. First and last causes the POR to move to the first and last items within the same level.

Successful end conditions:
Refer to the section AT-SPI interfaces to obtain element characteristics for AT renderings for each item in a tree-style set of items or controls.

Use of AT-SPI interfaces by ATs:

Using AT-SPI 1.6:

The AT should bound document traversal using the accessible Table interface.

Recommend interfaces for navigation of tree-style items or controls, plus interfaces for getting the required element characteristics for that item or control. Recommend how to handle point of regard (POR) and performance (by limiting scope to the visible area, streaming, etc).

Also, see the section on Desirable Roles for Document Objects and Navigation.

Use case: Where am I

Scenario 1: Ask where your point of regard is within a document and within containers in the document

Main scenario description:
For this scenario, the user wants to receive information about the current element characteristics plus type of parent container(s), location of element within a container(s), container title(s) and position(s) relative to the whole document, section heading, element position relative to the whole document, and document title. Where am I output varies based on the current POR, the type of element at the current POR, and the parent container type. Example of possible output by AT:

Select menu 1 of 3.
Labeled Search type.
Select menu item 2 of 5.
Form 1 of 5.
Heading level 3: BluePages.
At 59% of page.

Alternate scenarios:

  1. The AT may create a UI where successive Where am I requests generates more verbose or more terse responses.

Failed end conditions:

  1. The document or container is empty.

Successful end conditions:
For each potential point of regard, the AT may require the following element characteristics, if they exist, depending on the type of elements in the item at the current POR:

Table information if in a table:

  1. Caption and table summary
  2. Content for row and column headers
  3. Relative number (n of total number) for the table in the document
  4. Relative row and column number (x of total, y of total) within parent table
  5. Table type/role (data, spreadsheet, calendar)

Section information if in a section:

  1. Section type (page, frame, heading)
  2. Section title
  3. Relative number (n of total) for the section type in the document
  4. Level if in a section with a heading
  5. Relative item number (n of total) within the section

Form control information if on a form control:

  1. Group label for a control (such as LEGEND or OPTGROUP in HTML) if in a group
  2. Label or alternative text (title)
  3. Type of form control (role)
  4. State
  5. Relative number (n of total) of the parent form in the document
  6. Relative form control number (n of total) within the parent form

Map information if in a map:

  1. Relative area number (n of total) within the areas of a map
  2. Title attribute for map

List or menu information if within a menu or list:

  1. Type (role) - menu, simple list, definition list, ordered list, folder, navigation bar, etc
  2. Title from parent menu or list
  3. Relative number (n of total) of parent list or menu in the document
  4. Relative list item number (n of total) within the list

Link information if on a link:

  1. Relative link number (n of total) within the document

For all locations:

  1. Number of items in the document
  2. Relative item number (n of total) within the document
  3. Document title

Refer to the section AT-SPI interfaces to obtain element characteristics for AT renderings for how to obtain each type of information that should be rendered for Where am I.

Use case: Document Summary

Scenario 1: Ask for a summary of the types of elements and containers in the document

Main scenario description:
For this scenario, the users wants to know the document title and language as well as the number of tables, links, headings, frames, forms, controls, items, images, and pages.

Alternate scenarios:

  1. The application may implement a document summary feature available from their UI instead of this programming interface.

Failed end conditions:

  1. The document or container is empty.

Successful end conditions:
Refer to the section AT-SPI interfaces to obtain element characteristics for AT renderings for the desired statistics for the document summary.

AT-SPI Interfaces to Obtain Element Characteristics for AT Renderings

The AT may require the following element characteristics, if they exist, depending on the type of elements required for the requested rendering:

Element characteristic Recommended AT-SPI interface to obtain it
Main text content for an element
  1. Hyperlinks: content should be obtained from the text lying between the Hyperlink's specified start/end offsets in the host Hypertext object.
  2. If start and end offsets are the same, query the Hyperlink's anchors for objects that implement Image, and present Accessible:getName or Accessible:getDescription
Alternative or descriptive text for an image or an image link or button (title, alt, name) Accessible:getName to get alt or Accessible:getDescription to get title?
Long description link for an image or image link (URL) Image::getImageDescription
Alternative content or descriptive text (title) for an embedded object
Label or alternative text (title) for a control
Group label for a control (such as LEGEND or OPTGROUP in HTML)
Alternative text (title attribute) for any element (such as an abbreviation or text link) or beginning of any container (such as a map, select menu, or frame)
Caption and table summary for the beginning of a table
Content for row and column headers if in a table cell
Relative number (n of total number) for the beginning of a form and for a table
Number of rows and columns for the beginning of a table
Number of controls within a form, areas within a map, or items in a list of links
Row and column number if in a table cell
Level number if item is a heading or in a section with a heading
Index or start + index for a list item which is a link
Shortcut key for an interactive element
Value for an interactive element
Role for any element
State for any interactive element
Actions for any interactive elements, including event handlers
Locale (language) for element Image::getLocale
Text object - result of cascade:
Accessible::getApplication, then
Application::getLocale, Document::getLocale, and text attributes named "lang" or "locale".
URL if a link
Source filename if an image
Text attributes
Title for target (current) frame or document
Long description link for a frame (URL)
Number of frames in a frameset
Relative number of the current frame
Section type (page, frame, heading)
Relative number (n of total number) for a section with a heading
Relative page number in a document (n of total number)
Relative item number (n of total number) within a document
Relative link number (n of total) within the document
Document language Accessible::getApplication, then
Application::getLocale, Document::getLocale,
Number of form controls in a document
Number of images and embedded objects in a document

Desirable Roles for Document Objects and Navigation

To identify, interact with, and navigate to custom widgets in documents, such as DHTML widgets on Web pages, the assistive technology needs to be able to locate and obtain element characteristics for extensible roles, values, and states for any element. Extensible roles should also be used to identify and navigate to new types of document landmarks and text objects. In XHTML 2.0, Web authors will be able to identify new document landmarks using the role attribute. However, even in today's Web, word processing, PDF, spreadsheet, and presentation documents, there are custom widgets, document landmarks, and text objects th AT cannot identify using an extensible role value.

Below is an initial list of document object and landmark roles that ATs should be able to identify, navigate to, and obtain element characteristics. Some are already standard roles, others would be new roles.

Document
Preface
Appendix
Epilogue
Part
Article
Section
Division
BlockQuote
Caption
Table of contents
Table of figures
Table of tables
Index
Footnote
Endnote
Sidebar
Paragraph
Table, Table cell
List
Figure
Formula
Equation
Heading
Reference
Bibliography
Credits
Link
Example
Quotation
Instructions
Contact Information
Return Address
Salutation
Signature
Date
Form
Form Section or Subsection
Interactive field types
Form controls (radio button, check box, text field, text area, password field, combo box, select list, multi-select)
Advertisement
Comments
Multi-media content (sound clips, movies, etc).
Abbreviation
Image, Graph, Chart, Diagram
Spreadsheet
Calendar
Menu
Tabbed section
Map, Map areas
Bookmark
XHTML 2.0 sections identified through role attribute