up:
Chapter 11 -- Coprocessing and Multiprocessing
prev: 11.1 Coprocessing
next:
Chapter 12 -- Debugging
11.2 General Multiprocessing
The components of the general multiprocessing interface include:
- The LOCK# signal
- The
LOCK instruction prefix, which gives programmed control of the
LOCK# signal.
- Automatic assertion of the LOCK# signal with implicit memory updates
by the processor
11.2.1 LOCK and the LOCK# Signal
The LOCK instruction prefix and its corresponding output signal LOCK# can
be used to prevent other bus masters from interrupting a data movement
operation.
LOCK may only be used with the following
80386 instructions when
they modify memory. An undefined-opcode exception results from using
LOCK
before any instruction other than:
- Bit test and change: BTS, BTR,
BTC.
- Exchange: XCHG.
- Two-operand arithmetic and logical: ADD,
ADC,
SUB,
SBB,
AND,
OR,
XOR.
- One-operand arithmetic and logical:
INC,
DEC,
NOT, and
NEG.
A locked instruction is only guaranteed to lock the area of memory defined
by the destination operand, but it may lock a larger memory area. For
example, typical 8086 and 80286 configurations lock the entire physical
memory space. The area of memory defined by the destination operand is
guaranteed to be locked against access by a processor executing a locked
instruction on exactly the same memory area, i.e., an operand with
identical starting address and identical length.
The integrity of the lock is not affected by the alignment of the memory
field. The
LOCK signal is asserted for as many bus
cycles as necessary to update the entire operand.
11.2.2 Automatic Locking
In several instances, the processor itself initiates activity on the data
bus. To help ensure that such activities function correctly in
multiprocessor configurations, the processor automatically asserts the LOCK#
signal. These instances include:
- Acknowledging interrupts.
After an interrupt request, the interrupt controller uses the data bus
to send the interrupt ID of the interrupt source to the CPU. The CPU
asserts LOCK# to ensure that no other data appears on the data bus
during this time.
- Setting busy bit of TSS descriptor.
The processor tests and sets the busy-bit in the type field of the TSS
descriptor when switching to a task. To ensure that two different
processors cannot simultaneously switch to the same task, the processor
asserts LOCK# while testing and setting this bit.
- Loading of descriptors.
While copying the contents of a descriptor from a descriptor table into
a segment register, the processor asserts LOCK# so that the descriptor
cannot be modified by another processor while it is being loaded. For
this action to be effective, operating-system procedures that update
descriptors should adhere to the following steps:
- Use a locked update to the access-rights byte to mark the
descriptor not-present.
- Update the fields of the descriptor. (This may require several
memory accesses; therefore, LOCK cannot be used.)
- Use a locked update to the access-rights byte to mark the
descriptor present again.
- Updating page-table A and D bits.
The processor exerts LOCK# while updating the A (accessed) and D
(dirty) bits of page-table entries. Also the processor bypasses the
page-table cache and directly updates these bits in memory.
- Executing XCHG instruction.
The 80386 always asserts LOCK during an
XCHGinstruction that
references memory (even if the
LOCK prefix is not used).
11.2.3 Cache Considerations
Systems programmers must take care when updating shared data that may also
be stored in on-chip registers and caches. With the 80386, such shared
data includes:
- Descriptors, which may be held in segment registers.
A change to a descriptor that is shared among processors should be
broadcast to all processors. Segment registers are effectively
"descriptor caches". A change to a descriptor will not be utilized by
another processor if that processor already has a copy of the old
version of the descriptor in a segment register.
- Page tables, which may be held in the page-table cache.
A change to a page table that is shared among processors should be
broadcast to all processors, so that others can flush their page-table
caches and reload them with up-to-date page tables from memory.
Systems designers can employ an interprocessor interrupt to handle the
above cases. When one processor changes data that may be cached by other
processors, it can send an interrupt signal to all other processors that may
be affected by the change. If the interrupt is serviced by an interrupt
task, the task switch automatically flushes the segment registers. The task
switch also flushes the page-table cache if the PDBR (the contents of CR3)
of the interrupt task is different from the PDBR of every other task.
In multiprocessor systems that need a cacheability signal from the CPU, it
is recommended that physical address pin A31 be used to indicate
cacheability. Such a system can then possess up to 2 Gbytes of physical
memory. The virtual address range available to the programmer is not
affected by this convention.
up:
Chapter 11 -- Coprocessing and Multiprocessing
prev: 11.1 Coprocessing
next:
Chapter 12 -- Debugging