Instruction/ maintenance manual of the product 253668-032US Intel
Go to page of 806
In tel® 64 and IA-32 Ar chitectur es So ftw ar e De v eloper’ s Manual Vo l u m e 3 A : S ystem Pr ogr amming Guide, P art 1 NO TE: The In tel ® 64 and IA-32 Ar chitectures So ftwar e Dev eloper&a.
ii Vol. 3A INFORMA TION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LI CENSE, EXPRESS OR IMPLIED, BY EST OPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY TH IS DOCUMENT .
Vol. 3A iii CONTENTS PAG E CHAPTER 1 ABOUT THIS MANUAL 1.1 PROCESSORS C OVERED IN THIS MANUAL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-1 1.2 OVERVIEW OF THE S YSTEM PROGRAMMING GUIDE . . . . . . . . . . .
CO NTE NT S iv Vol. 3A PAG E 2.7.5 Controllin g the Processor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-31 2.7.6 Reading Perf ormance-Monito ring and Time-Stamp Counters . . . . . .
Vol. 3A v CO NTE NT S PAGE 4.9.3 Caching Paging -Related Informati on about Memory Typ ing . . . . . . . . . . . . . . . . . . . . . . . 4-38 4.10 CACHING TRANSLAT ION INFORMATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
CO NTE NT S vi Vol. 3A PAG E 5.8.7.1 SYSENTER and SY SEXIT Instructions in IA-32e Mo de. . . . . . . . . . . . . . . . . . . . . . . . . . 5-31 5.8.8 Fast System Calls in 64- bit Mode. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Vol. 3A vii CO NTE NT S PAGE 6.14 EXCEPTION AND INT ERRUPT HANDLING IN 64-BIT MO DE . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-22 6.14.1 64-Bit Mode IDT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
CO NTE NT S viii Vol. 3A PAG E CHAPTER 8 MULTIPLE-PR OCESSOR MANAGEMEN T 8.1 LOCKED ATOM IC OPERATIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-2 8.1.1 Guaranteed A tomic Operation s .
Vol. 3A ix CO NTE NT S PAGE 8.7.9 Memory Orderin g . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-42 8.7.10 Serializing Instructions . . . . . . . . . . . . . . . . . . .
CO NTE NT S x Vol. 3A PAG E 9.5 MEMORY TYPE RAN GE REGISTERS (MTRRS) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-9 9.6 INITIAL IZING SSE/SSE2/SS E3/SSSE3 EXTENS IONS . . . . . . . . . . . . . . . . . . . . . . .
Vol. 3A xi CO NTE NT S PAGE CHAP TER 10 ADVA NCED PR OGRAMMABLE INTERRUP T CONTR OLLER (APIC) 10.1 LOCAL AND I/O AP IC OVERVIEW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-1 10.2 SYSTEM BUS VS .
CO NTE NT S xii Vol. 3A PAG E 10.7.2.4 Deriving Logical x2APIC ID from the Local x2AP IC ID . . . . . . . . . . . . . . . . . . . . . . . . . 10-50 10.7.2.5 Broadcast/Self Delivery Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Vol. 3A xiii CO NTE NT S PAGE 11.11 MEMORY TYPE RANGE REGISTERS (MTR RS) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-30 11.11.1 MTRR Feature Identificati on . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
CO NTE NT S xiv Vol. 3A PAG E 13.1.6.1 Numeric Error flag and IGNNE# . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-8 13.2 EMULATION OF SSE/ SSE2/SSE3/SSSE3/SSE4 EX TENSIONS . . . . . . . . . . . . . . .
Vol. 3A xv CO NTE NT S PAGE 15.3 MACHINE-CH ECK MSRS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-2 15.3.1 Machine-Check Global Co ntrol MSRs . . . . . . . . . . . . . . . . .
CO NTE NT S xvi Vol. 3A PAG E CHAPTER 16 DEBUGGING, PR OFILING BR ANCHES AND TIME-S TAMP COUNTER 16.1 OVERVIEW OF DEBUG SUPPORT FACILITIES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-1 16.2 DEBUG REGISTERS . . . .
Vol. 3A xvii CO NTE NT S PAGE 16.9 LAST BRANCH, INTERRUPT, AND EXCEPTION RECORDING (PEN TIUM M PROCESSORS) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-43 16.10 LAST BRANCH , INTERRUPT, AND EXCEPTION RECORDING (P6 F AMILY PROCESSOR S) .
CO NTE NT S xviii Vol. 3A PAG E CHAPTER 18 MIXING 16-BIT AND 32-BIT CODE 18.1 DEFINING 16-BIT AND 32-BIT P ROGRAM MODULES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18-2 18.2 MIXING 16-BIT AND 32-BIT OPERATIONS WITHIN A CODE SEGMENT .
Vol. 3A xix CO NTE NT S PAGE 19.18.6.3 Numeric Und erflow Exception (#U) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-1 4 19.18.6.4 Exception Precede nce . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
CO NTE NT S xx Vol. 3A PAG E 19.25 EXCEPTIONS AND/O R EXCEPTION CONDITIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-28 19.25.1 Machine-Check Archit ecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Vol. 3A xxi CO NTE NT S PAGE 20.5 VIRTUAL-MACHINE CON TROL STRUCTURE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20-3 20.6 DISCOVERING SUPPORT FOR VMX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
CO NTE NT S xxii Vol. 3A PAG E CHAPTER 22 VMX NON-R OOT OPER ATION 22.1 INSTRUCTIONS THAT CAUSE VM EXITS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22-1 22.1.1 Relative Priority of Faults and VM Exits . . .
Vol. 3A xxiii CO NTE NT S PAGE 23.3.1.3 Checks on Gu est Descriptor-T able Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23-15 23.3.1.4 Checks o n Guest RIP and R FLAGS . . . . . . . . . . . . . . . . . . . . . . . . . . . .
CO NTE NT S xxiv Vol. 3A PAG E 24.5.6 C learing Address-Ran ge Monitoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24-37 24.6 LOADING MSRS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Vol. 3A xxv CO NTE NT S PAGE 26.11 SMBASE RE LOCATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26-19 26.11.1 Relocating SMRAM to an Addr ess Above 1 MByte . . . . . . . . . .
CO NTE NT S xxvi Vol. 3A PAG E 27.7.1 Handling VM Exits Due to Exceptio ns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27-11 27.7.1.1 Reflecting E xceptions to Guest Sof tware. . . . . . . . . . . . . . . . . . .
Vol. 3A xxvii CO NTE NT S PAGE CHAP TER 29 HANDLING BOUNDARY CONDITIONS IN A VIRTUAL MACHINE MONITOR 29.1 OVERVIEW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
CO NTE NT S xxviii Vol. 3A PAG E 30.5 PERFORMANCE MONITOR ING (PROCESSORS BASED ON IN TEL ® ATOM ™ MICROARCH ITECTURE) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-25 30.6 PERFORMANCE MONITORING FOR PROCESSORS BASED ON INTEL ® MICROARCHITECTURE (NEHALEM) .
Vol. 3A xxix CO NTE NT S PAGE 30.10.3 Incrementing the Time-Stamp C ounter. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30- 77 30.10.4 Non-Halted Reference Clockticks . . . . . . . . . . . . . . . . . . . . . . . . .
CO NTE NT S xxx Vol. 3A PAG E B.3 MSRS IN THE INTEL ® ATOM ™ PROCESSO R FAMILY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-58 B.4 MSRS IN THE INTEL ® MICRO ARCHITECTURE (NEHALEM) . . . . . . . . . . . . . . . . . . . . . . . .
Vol. 3A xxxi CO NTE NT S PAGE E.4.3 Processor Model Specific Error C ode Field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . E- 21 E.4.3.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
CO NTE NT S xxxii Vol. 3A PAG E H.4.2 Natural-Width R ead-Only Data Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . H-10 H.4.3 Natural-Width Guest-S tate Fields . . . . . . . . . . . . . . . . . . . . . .
Vol. 3A xxxiii CO NTE NT S PAGE FIGUR ES Figure 1-1. Bit and Byte Or der . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-7 Figure 1-2. Syntax for CPUID, CR , and MSR Data Prese ntation .
CO NTE NT S xxxiv Vol. 3A PAG E Figure 6-2. IDT Gate Descrip tors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-15 Figure 6-3. Interrupt Procedure Call . . . . . . . . . . . . . . . .
Vol. 3A xxxv CO NTE NT S PAGE Figure 10-14. Error Status Register (ESR) in x2APIC Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-36 Figure 10-15. Divide Configuratio n Register . . . . . . . . . . . . . . . . . . . . . . . .
CO NTE NT S xxxvi Vol. 3A PAG E Figure 14-11. IA32_THERM_STATUS R egister . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-1 9 Figure 14-12. IA32_THERM_INTERRUPT Reg ister . . . . . . . . . . . . . . . . . . .
Vol. 3A xxxvii CO NTE NT S PAGE Figure 29-1. Host External Interrupts a nd Guest Virtual Interru pts . . . . . . . . . . . . . . . . . . . . . . . . . 29-5 Figure 30-1. Layout of IA32_PER FEVTSELx MSRs . . . . . . . . . . . . . . . . . . . . . . . . .
CO NTE NT S xxxviii Vol. 3A PAG E TABLES Table 2-1. Action Taken By x87 FPU In structions for Different Combinations of EM, MP, and TS2-21 Table 2-2. S ummary of System Instruction s . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Vol. 3A xxxix CO NTE NT S PAGE Table 8-2. Initia l APIC IDs for the Logical Proc essors in a System that has Two Physical Processors Supporting Dual-Core a nd Intel Hyper-Threading Technology8-53 Table 8-3.
CO NTE NT S xl Vol. 3A PAG E Table 13-1. Action Taken for C ombinations of OSFXSR , OSXMMEXCPT, SSE, SS E2, SSE3, EM, MP, and TS113-4 Table 13-2. Action Taken for Combi nations of OSFXSR, SSSE3 , SSE4, EM, and TS . . . . . . . . . . 13-5 Table 13-3. XSAVE Head er Format .
Vol. 3 A xli CO NTE NT S PAGE Table 21-4. Forma t of Pending-Deb ug-Exceptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 -8 Table 21-5. Definition s of Pin-Based VM-Executi on Controls . . . . . . . . . . . . .
CO NTE NT S xlii Vol. 3A PAG E Table 30-1. UMask and Event Select E ncodings for Pre-Define d Architectural Performance Events30-13 Table 30-2. Core Specificity E ncoding within a Non- Architectural Umask . . . . . . . . . . . . . . . . . . 30-15 Table 30-3.
Vol. 3A xliii CO NTE NT S PAGE Table A-15. List of Metrics Available for Replay T agging (For Replay Event Only)A-206 Table A-16. Event Mask Qualificati on for Logical Processors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A -208 Table A-17 .
CO NTE NT S xliv Vol. 3A PAG E Table F-2. Short Me ssage (21 Cycles) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .F-2 Table F-3. Non-Focused Lowest Priority Messa ge (34 Cycles) . . . . . . . .
Vol. 3 1-1 CHAP TER 1 ABOUT THIS MANUAL The Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 3A: System Programming Guide, Part 1 (order numbe r 253668) and the Intel® 64 and .
1-2 Vol. 3 ABOUT THIS M ANUAL • Dual-Core Intel ® Xe o n ® processor L V • Intel ® Core™2 Duo processor • Intel ® Core™2 Quad processor Q6000 series • Intel ® Xe o n ® processor 3000.
Vol. 3 1-3 ABOUT THIS MANUAL The Intel ® Core TM i7 processor and the Intel ® Core TM i5 processor are based on the Intel ® microarchitecture (Nehalem) and support Intel 64 architecture. Processors based on the Next Generation Intel Processor , codenamed W estmere, support Intel 64 architecture.
1-4 Vol. 3 ABOUT THIS M ANUAL Chapter 6 — Interrupt and Exception Handl ing. Describes the basic interrupt mechanisms defined in the Intel 64 and IA -32 architectures, shows how interrupts and exceptions relate to protection, and de scribes how the architecture handles each exception type.
Vol. 3 1-5 ABOUT THIS MANUAL Chapter 16 — Debugging, Branch Profiles and Time-Stamp Counter. Describes the debugging registers and othe r debug mechanism provided in Intel 64 or IA-32 processors. This chapter also describes the time-stamp counter . Chapter 17 — 8086 E mulation.
1-6 Vol. 3 ABOUT THIS M ANUAL Chapter 30 — Perf ormance Monitoring. Describes the Intel 64 and IA-32 archi - tectures’ facilities for monitoring performance. Appendix A — Performance-Monitoring Events. Lists architectur al performance events. Non-architectur al performance events (i.
Vol. 3 1-7 ABOUT THIS MANUAL means the bytes of a word are numbered st arting from the least significant byte. Figure 1-1 illustrates these conventions. 1.3.2 R eserved Bits and Softw a r e Compatibility In many register and memory layout descriptions, certain bits are marked as reserved .
1-8 Vol. 3 ABOUT THIS M ANUAL 1.3.3 Instruction Oper ands When instructions are represented symbolically , a subset of assembly language is used. In this subset, an instruction has the following format: label: mnemo nic argument 1, argument 2, argument3 where: • A label is an identifier which is followed by a colon.
Vol. 3 1-9 ABOUT THIS MANUAL For example, a progr am can keep its code (instructions) and stack in separate segments. Code addresses would always refer to the code space, and stack addresses would always refer to the stack space.
1-10 Vol. 3 ABOUT THIS M ANUAL 1.3.7 Ex cep tions An exception is an event that typically occurs when an instruction causes an error . For example, an attempt to divide by zero generates an ex ception. However , some exceptions, such as breakpoints, occur und er other conditions.
Vol. 3 1-11 ABOUT THIS MANUAL This example refers to a page-fault exception under conditions where an error code naming a type of fault is reported. Under some conditions, exceptions which produce error codes may not be able to report an accurate code.
1-12 Vol. 3 ABOUT THIS M ANUAL • Intel ® 64 Architecture Processor T opology Enumeration: http://softwarecommunity .intel.com/articles/eng/3887.htm • Intel ® T rusted Execution T echnology Measured Launched E nvironment Programming Guide, http://www .
Vol. 3 2-1 CHAP TER 2 SYS TEM ARCHITECTUR E OV ERVIEW IA-32 architecture (beginning with the In tel386 processor family) provides extensive support for operating-system and system-development software. This support offers multiple modes of oper ation, which include: • Real mode, protected mode, virtual 8 086 mode, and system management mode.
2-2 Vol. 3 SYSTEM AR CHITECTUR E OVERVIEW initiates the switch from real-address mode to protected mode. If IA -32e mode oper - ation is desired, software also initiates a switch from protected mode to IA-32e mode.
Vol. 3 2-3 SYSTEM ARCHITECTURE OVERVIEW Figure 2-1. IA-32 S ystem-Lev el R egisters and Data St ructures Local Descriptor T able (LDT) EFLAGS Register Control Registers CR1 CR2 CR3 CR4 CR0 Global Descriptor T able ( GDT) Interrupt Descriptor T able (IDT) IDTR GDTR Interrupt Gate T rap Gate LDT Desc.
2-4 Vol. 3 SYSTEM AR CHITECTUR E OVERVIEW Figur e 2-2. System-L ev el Reg isters and Data S tructures in IA-32e Mode Local Descriptor T able (LDT) CR1 CR2 CR3 CR4 CR0 Global Descriptor T able ( GDT) Interrupt Descriptor T able (IDT) IDTR GDTR Interrupt Gate T rap Gate LDT Desc.
Vol. 3 2-5 SYSTEM ARCHITECTURE OVERVIEW 2.1.1 Global and Local Descrip tor T ables When operating in protected mode, all memory accesses pass through either the global descriptor table (GDT) or an optional local descriptor table (LDT) as shown in Figure 2-1 .
2-6 Vol. 3 SYSTEM AR CHITECTUR E OVERVIEW The architecture also defines a set of special descriptors called gates (call gates, interrupt gates, tr ap gates, and task ga tes). These provide protected gateways to system procedures and handlers that may o perate at a different privilege level than application programs and most procedures.
Vol. 3 2-7 SYSTEM ARCHITECTURE OVERVIEW 2. Loads the task register with the segment selector for the new task. 3. Accesses the new TSS through a segment descriptor in the GD T .
2-8 Vol. 3 SYSTEM AR CHITECTUR E OVERVIEW The IDTR register is expanded to hold a 64-bit base address. T ask gates are not supported. 2.1.5 Memory Management System architecture supports either direct physical addressing of memory or virtual memory (through paging).
Vol. 3 2-9 SYSTEM ARCHITECTURE OVERVIEW 2.1.6 System R egisters T o assist in initiali zing the processor and controlling system operations, the system architecture provides system flags in the EFLAGS.
2-10 Vol. 3 SYSTEM AR CHITECTUR E OVERVIEW On systems that support IA-32e mode, the extended feature enable register (IA32_EFER) is available. This model-specific register controls activation of IA-32e mode and other IA-32e mode oper ations.
Vol. 3 2-11 SYSTEM ARCHITECTURE OVERVIEW running progr am or task. SMM-specific code may then be executed tran sparently . Upon returning from SMM, the processor is placed back into its state prior to the SMI. • Virtual-80 86 mode — In protected mode, the pr ocessor supports a quasi- operating mode known as virtual-8086 mode.
2-12 Vol. 3 SYSTEM AR CHITECTUR E OVERVIEW The VM flag in the EFLAGS register determine s whether the processor is operating in protected mode or virtual-8086 mode. T ransitions between protected mode and virtual-8086 mode are generally carried out as part of a task switch or a return from an interrupt or exception handler .
Vol. 3 2-13 SYSTEM ARCHITECTURE OVERVIEW IF Interrup t enable (b it 9) — Controls the response of the processor to maskable hardware interr upt requ ests (see also: Section 6.3.2 , “Maskable Hardware Interrupts” ). The flag is set to respond to maskable hardware interrupts; cleared to inhibit maskable hardware interrupts.
2-14 Vol. 3 SYSTEM AR CHITECTUR E OVERVIEW changing to the state of this flag can generate unexpected exceptions in application programs. See also: Section 7.4, “T ask Linking. ” RF Resume (bit 16) — Controls the processor’s response to instruction-break - point conditions.
Vol. 3 2-15 SYSTEM ARCHITECTURE OVERVIEW VIP Virtual interrupt pending (bit 20) — Set by software to i ndicate that a n interrupt is pending; cleared to indicate that no inter rupt is pendin g. This flag is used in conjunctio n with the VIF flag. The pr ocessor re ads this f lag but never modifi es it.
2-16 Vol. 3 SYSTEM AR CHITECTUR E OVERVIEW 2.4.1 Global Descriptor T able R egister (GDTR) The GDTR register holds the base address (32 bits in protected mode; 64 bits in IA-32e mode) and the 16-bit table limit for the GD T .
Vol. 3 2-17 SYSTEM ARCHITECTURE OVERVIEW 2.4.3 IDTR In terrup t Descriptor T able R egister The IDTR register holds the base address (32 bits in protected mode; 64 bits in IA-32e mode) and 16-bit table limit for the IDT . The base address specifies the linear address of byte 0 of the IDT ; the table limit specifies the number of bytes in the table.
2-18 Vol. 3 SYSTEM AR CHITECTUR E OVERVIEW • The MOV CRn instructions do not check that addresses written to CR2 and CR3 are within the linear-address or physical-address limitations of the implemen - tation. • R egister CR8 is av ailable in 64-bit mode only .
Vol. 3 2-19 SYSTEM ARCHITECTURE OVERVIEW When loading a control register , reserved bits should always be set to the values previously read. The flags in control registers are: PG Paging (bit 31 of CR0) — Enables paging when set; disables paging when clear .
2-20 Vol. 3 SYSTEM AR CHITECTUR E OVERVIEW See also: Section 11.5 .3, “Preventing Caching, ” and Section 11.5, “Cache Control. ” NW Not Write-through (bit 29 of CR0) — When the NW and CD fla.
Vol. 3 2-21 SYSTEM ARCHITECTURE OVERVIEW delayed until an x87 FPU/MMX/SSE/S SE2/SSE3/SS SE3/SSE4 instruction is actually executed by the new task. The processor sets this flag on every task switch and tests it when executing x87 FPU/MMX/SSE/SSE2/SSE3/SS SE3/SSE4 instructions.
2-22 Vol. 3 SYSTEM AR CHITECTUR E OVERVIEW EM Emulation (bit 2 of CR0) — Indicates that the processor does not have an internal or external x87 FPU when set; indicates an x8 7 FPU is present when clear . This flag also affects the execution of MM X/ SSE /SS E2 /SS E3/ SSS E3/ SSE 4 ins tr uc tio ns .
Vol. 3 2-23 SYSTEM ARCHITECTURE OVERVIEW flag is set, caching of the page-directory is prevented; when the flag is clear , the page-directory can be cached.
2-24 Vol. 3 SYSTEM AR CHITECTUR E OVERVIEW when set; when clear , processor aliase s references to registers DR4 and DR5 for compatibility with software written to run on earlier IA-32 processors. See also: Section 16.2.2, “Debug Registers DR4 and DR5.
Vol. 3 2-25 SYSTEM ARCHITECTURE OVERVIEW processor will generate an inv alid opcode exception (#UD) if it attempts to execute any SSE/SSE2/SSE3and instruction, with the exception of P AUSE, PREFETCH h , SFENCE, LFENCE, MFENCE, MOVNTI, CLFLUSH, CRC32, and POPCNT .
2-26 Vol. 3 SYSTEM AR CHITECTUR E OVERVIEW all interrupts are enabled. This field is available in 64-bit mode. A value of 15 means all interrupts will be disabled. 2.5.1 CPUID Qualification of Con trol R egister Flags The VME, PVI, T SD, DE, PSE, P AE, MCE, PGE, PCE, OS FXSR, and OSXMMEXCPT flags in control register CR4 are model specific.
Vol. 3 2-27 SYSTEM ARCHITECTURE OVERVIEW state, SSE state, or a future processor extended state) is represented by a bit in XCR0. The OS can enable future processor extended states in a forward manner by specifying the appropriate bit mask value using the XSETBV instruction according to the results of the CPUID leaf 0DH.
2-28 Vol. 3 SYSTEM AR CHITECTUR E OVERVIEW SLD T St ore LD T Register No No LGDT Lo a d G DT R eg is te r No Ye s SGD T S tor e GD T Reg ister No No LT R Loa d T as k Re g is te r No Ye s STR S tor e .
Vol. 3 2-29 SYSTEM ARCHITECTURE OVERVIEW 2.7 .1 L oading and S toring Sys tem Regis ters The GDTR, LDTR, ID TR, and TR registers each ha ve a load and store instruction for loading data into and storing data from the register: • LGDT (Load GDTR Register) — Loads the GD T base address and limit from memory into the GD TR register .
2-30 Vol. 3 SYSTEM AR CHITECTUR E OVERVIEW The LMSW (load machine status word) and SMSW (store machine status word) instructions operate on bits 0 through 15 of control register CR0 . These instructions are provided for compatibility with the 16-bit Intel 286 processor .
Vol. 3 2-31 SYSTEM ARCHITECTURE OVERVIEW Instructions), ” for a detailed explanation of the function and use of this instruction. 2.7 .3 L oading and S toring Debu g Regis ters Internal debugging facilities in the processor are controlled by a set of 8 debug regis - ters (DR0-DR7).
2-32 Vol. 3 SYSTEM AR CHITECTUR E OVERVIEW introduced with the P entium Pro processor). If an y non-wak e events are pending during shutdown, they will be handled af ter the wake event from shutdown is processed (for example, A20M# interrupts).
Vol. 3 2-33 SYSTEM ARCHITECTURE OVERVIEW Fixed-function performance counters record only specific events that are defined in Chapter 20, “Introduction to Virtual-Machine Extensions” , and the width/number of fixed-function counters are enumerated by CPUID leaf 0AH.
2-34 Vol. 3 SYSTEM AR CHITECTUR E OVERVIEW 2.7 .7 .1 Re ading and Writing Model- Specific Regist ers in 64-Bit Mode RDMSR and WRMSR require an index to specify the address of an MSR. In 64-bit mode, the index is 32 bits; it is specified using ECX. 2.7 .
Vol. 3 3-1 CHAP TER 3 PR O TECTED-MODE MEMORY MANAGEMEN T This chapter describes the Intel 64 and IA-32 architecture’ s protected-mode memory management facilities, including the phys ical memory requirements, segmentation mechanism, and paging mechanism.
3-2 Vol. 3 PRO TECTED-MODE MEMO RY MANAGEMEN T segment, the segment type, and the location of the first byte of the segment in the linear address space (called the base address of the segment). The offset part of the logical address is added to the base addre ss for the segment to locate a byte within the segment.
Vol. 3 3-3 PRO TECTED-MODE MEMORY MANAGEMEN T storage. When using paging, each segment is divided into p ages (typically 4 KBytes each in size), which are stored e ither in physical memory or on the disk. The oper - ating system or executive maintains a page directory and a set of page ta bles to keep track of th e pages.
3-4 Vol. 3 PRO TECTED-MODE MEMO RY MANAGEMEN T FFFF_FFF0H. RAM (DRA M) is placed at the bottom of the add ress space because the initial base address for the DS data se gment after reset initialization is 0.
Vol. 3 3-5 PRO TECTED-MODE MEMORY MANAGEMEN T More complexity can be added to this pr otected flat model to provide more protec - tion. For example, for the paging mechanism to pro vide isolation betw.
3-6 Vol. 3 PRO TECTED-MODE MEMO RY MANAGEMEN T Access checks can be used to protect not only against referencing an address outside the limit of a segment, but also against performing disallowed operations in certain segments.
Vol. 3 3-7 PRO TECTED-MODE MEMORY MANAGEMEN T In 64-bit mode, segmentation is ge nerally (but not completely) disabled, creating a flat 64-bit linear-address space. The processor treats the segment base of CS, DS, ES, SS as zero, creating a linear address that is equal to the effective address.
3-8 Vol. 3 PRO TECTED-MODE MEMO RY MANAGEMEN T 3.3.1 Intel ® 64 Proc essors and Physical Addr ess Space On processors that support Intel 64 architecture (CPUID.80000001:EDX[29] = 1), the size of the physical address r ange is implementation-specific and indicated by CPUID.
Vol. 3 3-9 PRO TECTED-MODE MEMORY MANAGEMEN T If paging is not used, the processor maps the linea r address directly to a physical address (that is, the linear address goes out on the processor’s address bus).
3-10 Vol. 3 PRO TECTED-MODE MEMO RY MANAGEMEN T TI (table indicator) flag (Bit 2) — Specifies the descriptor table to use: clearing this flag selects the GDT ; setting this flag selects the current LD T . Requested Privilege Level (RPL) (Bits 0 and 1) — Specifies the privilege leve l of the selector .
Vol. 3 3-11 PRO TECTED-MODE MEMORY MANAGEMEN T For a progr am to access a segment, the segment selector for the segment must have been loaded in one of the segment register s. So, although a system can define thou - sands of segments, only 6 can be available for immediate use.
3-12 Vol. 3 PRO TECTED-MODE MEMO RY MANAGEMEN T 3.4.4 Segment Loading Ins tructions in IA-32e Mode Because ES, DS, and S S segment registers are not used in 64-bit mode, their fields (base, limit, and attribute) in segment de scriptor registers are ignored.
Vol. 3 3-13 PRO TECTED-MODE MEMORY MANAGEMEN T 3.4.5 Segment Descrip tors A segment descriptor is a data structure in a G D T or LDT that provides the processor with the size and location of a segment, as well as access control and status informa - tion.
3-14 Vol. 3 PRO TECTED-MODE MEMO RY MANAGEMEN T to the segment li mit. Offsets greate r than the segment limit generate general-protection exceptions (#GP). For expand-down segments, the segment limit has the reverse function; the offset can range from the segment limit to FFFFFFFFH or FFFFH, depending on the setting of the B flag.
Vol. 3 3-15 PRO TECTED-MODE MEMORY MANAGEMEN T store its own data, such as information regarding the whereabouts of the missing segment. D/B (default operation size/default st ack pointer size and/or .
3-16 Vol. 3 PRO TECTED-MODE MEMO RY MANAGEMEN T G (granularity) fla g Determines t he scaling o f the segmen t limit fiel d. When the granulari ty flag is clear , the segment limit is int erpreted in byte units; when flag is set, the s egment limit is interpreted in 4-KByte units.
Vol. 3 3-17 PRO TECTED-MODE MEMORY MANAGEMEN T Stack segments are data segments which mu st be read/write segments. Loading the SS register with a segment selector fo r a nonwritable data segment generates a general-protection exception (#GP).
3-18 Vol. 3 PRO TECTED-MODE MEMO RY MANAGEMEN T For code segments, the three low-order bits of the type field are interpreted as accessed (A), read enable (R), and conforming (C). Code segments can be execute- only or execute/read, depending on the setting of the read-enable bit.
Vol. 3 3-19 PRO TECTED-MODE MEMORY MANAGEMEN T • T ask -state segment (TSS) descriptor . • Call-gate descriptor . • Interrupt-gate descriptor . • T r ap-gate descriptor . • T ask -gate descriptor . These descriptor types fall into two catego ries: system-segment descriptors and gate descriptors.
3-20 Vol. 3 PRO TECTED-MODE MEMO RY MANAGEMEN T See also: Section 3.5.1, “Segment Descriptor T ables”, and Section 7.2.2, “TSS Descriptor” (for more information on the system-s egment descriptors); see Section 5.8.3, “Call Gates” , Section 6.
Vol. 3 3-21 PRO TECTED-MODE MEMORY MANAGEMEN T Each system must have one GD T defined, which may be used for all programs and tasks in the system. Optionally , one or more LDT s can be defi ned. For example, an LDT can be defined for each separate task being run, or some or all tasks can share the same LDT .
3-22 Vol. 3 PRO TECTED-MODE MEMO RY MANAGEMEN T 3.5.2 Segment Descript or T ables in IA-32e Mode In IA-32e mode, a segment descriptor table can contain up to 8192 (2 13 ) 8-byte descriptors. An entry in the segment descriptor table can be 8 bytes. System descrip - tors are expanded to 16 bytes (o ccupying the space of two entries).
Vol. 3 4-1 CHAP TER 4 PA G I N G Chapter 3 explains how segmentation converts logical addresses to linear addresses. Paging (or linear-address tr anslation) is the process of translating linear addresses so that they can be used to access memory or I/O devices.
4-2 Vol. 3 PAG ING paging modes. Section 4.1.3 discusses how CR0.WP , CR4 .PSE, CR4.PGE, and IA32_EFER.NXE modify the operation of the different paging modes. 4.1.1 Three P aging Modes If CR0.PG = 0, paging is not used. The logical processor treats all linear addresses as if they were physical addresses.
Vol. 3 4-3 PAG I NG linear addresses larger than 32 bits, 32-bit paging and PAE paging translate 32-bit linear addresses. Because it is used only if IA32 _EFER.LME = 1, I A-32e paging is used only in IA-32e mode. (In fact, it is the use of IA-32e paging that defines IA -32e mode.
4-4 Vol. 3 PAG ING enable these modes and make transitions be tween them. The following items identify certain limitations and other details: • IA32_EFER.LME cannot be modified while p aging is enabled (CR0.PG = 1). Attempts to do so using WRMSR cause a general-protection exception (#GP(0)).
Vol. 3 4-5 PAG I NG • Software can always disable paging by clearing CR0.PG with MOV to CR0. • Software can make transitions between 32-bit paging and PAE paging by changing the value of CR4.P AE with MOV to CR4. • Software cannot make tr ansitions directly between IA-32e pag ing and either of the other two paging mode s.
4-6 Vol. 3 PAG ING 4.1.4 Enumeration o f Paging F eatures b y CPUID Software can discover support for different paging features using the CPUID instruc- tion: • PSE: page-size extensions for 32-bit paging. If CPUID.01H:EDX.PSE [bit 3] = 1, CR4.PSE may be se t to 1, enabling support for 4-MByte pages with 32-bit paging (see Section 4.
Vol. 3 4-7 PAG I NG 4.2 HIER ARCHICAL P AGING S TRUCTURES: AN OV ERVIEW All three paging modes translate linear addresses use hierarchical paging struc- tures . This section provides an ov erview of th eir operation. Section 4.3, Section 4.4, and Section 4.
4-8 Vol. 3 PAG ING and bits 20:12 ide ntify a fourth. Again, the last identifi es the page frame. (See Figure 4-8 for an illustration.) The translation process in each of the ex amples abov e completes by identifying a page frame. However , the paging structures may be configured so that tr anslation terminates before doing so.
Vol. 3 4-9 PAG I NG corresponds to 1 TByte, linear addresses are limited to 32 bits; at most 4 GBytes of linear-address space may be accessed at any given time. 32-bit paging uses a hierarchy of paging structures to produce a translation for a linear address.
4-10 Vol. 3 PAG ING 32-bit paging may map linear addresses to eithe r 4-KByte pages or 4-MByte pages. Figure 4-2 illustrates the translation process when it uses a 4-KByte page; Figure 4-3 covers the case of a 4-MByte page.
Vol. 3 4-11 PAG I NG Because a PDE is identified using bits 31:22 of the linear address, it controls access to a 4-Mbyte region of the linear-address sp ace. Use of the PDE depends on CR.PSE and the PDE’s PS flag (bit 7): • If CR4.PSE = 1 and the PDE’s PS flag is 1, the PDE maps a 4-MByte page (see T able 4-4).
4-12 Vol. 3 PAG ING — Bits 31:12 are from the PTE. T able 4-4. F ormat of a 32-Bit P age-Direct ory Entry that Maps a 4-MByte P age Bit Posi tion (s) Contents 0 (P) Present ; must be 1 to ma p a 4-MByte page 1 (R/W) Read/write; if 0, writes may no t be allow ed to the 4- MByte page re fer enced by this entr y (depends on CPL and CR 0.
Vol. 3 4-13 PAG I NG — Bits 11:0 are from the original linear address. If a paging-structure entry’ s P flag (bit 0) is 0 or if the entry sets any reserved bit, the entry is used neither to reference another paging-structure entry nor to map a page.
4-14 Vol. 3 PAG ING — If the P flag of a PTE is 1, bit 7 is reserved. — If the P flag and the PS flag of a PDE are both 1, bit 12 is reserved. (If CR4.
Vol. 3 4-15 PAG I NG those that do neither because they are “not present”; bit 0 (P) and bit 7 (PS ) are highlighted because they determin e how such an entry is used. 4.4 P AE PAGING A logical processor uses PAE paging if CR0.PG = 1, CR 4.P AE = 1, and IA32_EFER.
4-16 Vol. 3 PAG ING ters. (This is different from the other paging modes, in which there is one hierarchy referenced by CR3.) Section 4.4.1 discusses the PDPTE registers.
Vol. 3 4-17 PAG I NG T able 4-8 gives the format of a PDPTE. If an y of the PDPTEs sets both the P flag (bit 0) and any reserved bit, the MOV to CR instruction causes a general-protection exception (#GP(0)) and the PDPTEs are not loaded. 1 A s s h o w i n T a b l e 4 - 8 , bi t s 2: 1 , 8:5, and 63:MAXPHY ADDR are reserved in the PDPTEs.
4-18 Vol. 3 PAG ING processor ignores bits 63:1, and there is no mapping for the 1-GByte region controlled by PDPTE i . A reference using a linear address in this region causes a page-fault exception (see Section 4.
Vol. 3 4-19 PAG I NG 4.4.1) A page directory c omprises 512 64-bit entries (PDEs). A PDE is select ed using the physical address defined as follows: — Bits 51:12 are from PDPT E i . — Bits 11:3 are bits 29:21 of the linear address. — Bits 2:0 are 0.
4-20 Vol. 3 PAG ING T able 4-9 . F o rmat of a P AE Page-Directory Entry that Maps a 2- MByte Page Bit Posi tion (s) Contents 0 (P) Present ; must be 1 to ma p a 2-MByte page 1 (R/W) Read/write; if 0, writes may no t be allow ed to the 2- MByte page re fer enced by this entr y (depends on CPL and CR 0.
Vol. 3 4-21 PAG I NG A reference using a linear address that is successfully tr anslated to a ph ysical address is performed only if allowed by the access rights of the tr anslation; see Section 4.6. Figure 4-7 gives a summary of the formats of CR3 and the paging-structure entries with P AE paging.
4-22 Vol. 3 PAG ING T able 4-11. F ormat of a P AE Page-T able Entr y that Maps a 4-KByte Page Bit Posi tion (s) Contents 0 (P) Present ; must be 1 to ma p a 4-KByte page 1 (R/W) Read/write; if 0, writes ma y not be allo wed to the 4-KByte page re fer enced by this entry (dep ends on CPL and CR0.
Vol. 3 4-23 PAG I NG that do neither because they are “not present”; bit 0 (P) and bit 7 (PS) are high- lighted because they determine how a paging-structure entry is used. 4.5 IA-32E PAGING A logical processor uses IA -32e paging if CR0.PG = 1, CR4.
4-24 Vol. 3 PAG ING bits corresponds to 4 PByte s, linear addresses are limited to 48 bits; at most 256 TBytes of linear-address space ma y be accessed at any given time. IA-32e paging uses a hier archy of paging structures to produce a translation for a linear address.
Vol. 3 4-25 PAG I NG • A 4-KByte naturally aligned page-directory-pointer table is located at the physical address specified in b its 51:12 of the PML4E ( see T able 4-13). A page- directory-pointer table comprises 512 64-bit entries (PDPTEs). A PDPTE is selected using the physical address defined as follows: — Bits 51:12 are from the PML 4E.
4-26 Vol. 3 PAG ING Because a PDE is identified using bits 47:21 of the linear address, it controls access to a 2-MByte region of the linear-address space.
Vol. 3 4-27 PAG I NG T able 4-13. F ormat of an IA -32e PML4 Entry (PML4E) that Refer ences a Page- Direct ory-Pointer T able Bit Pos it ion (s ) Conten ts 0 ( P) Pre s ent ; m us t b e 1 to refer en .
4-28 Vol. 3 PAG ING • If the PDE’s PS flag is 1, the PDE maps a 2-MByte page (see T able 4-15). The final physical address is computed as follows: T able 4-14.
Vol. 3 4-29 PAG I NG — Bits 51:21 are from the PDE. — Bits 20:0 are from the original linear address. • If the PDE’s PS flag is 0, a 4-KByte natu rally aligned page table is located at the physical address specified in bits 51:12 of the PDE (see T able 4-16 ).
4-30 Vol. 3 PAG ING comprises 512 64-bit entries (PTEs). A PTE is selected using the physical address defined as follows: — Bits 51:12 are from the PDE.
Vol. 3 4-31 PAG I NG — Bits 11:0 are from the original linear address. If a paging-structure entry’ s P flag (bit 0) is 0 or if the entry sets any reserved bit, the entry is used neither to reference another paging-structure entry nor to map a page.
4-32 Vol. 3 PAG ING • If the P flag of a PML4E or a PDPTE is 1, the PS flag is reserved. • If the P flag and the PS flag of a PD E are both 1, bits 20:13 are re served. • If IA32_EFER.NXE = 0 and the P flag of a pa ging-structure entry is 1, the XD flag (bit 63) is reserved.
Vol. 3 4-33 PAG I NG — Data reads. Data may be read from any linear address with a valid tr anslation for which the U/S flag (bit 2) is 1 in every pagi ng-structure entry controlling the trans- lation .
4-34 Vol. 3 PAG ING both the R/W flag and the U/S flag are 1 in ev ery paging-structure entry controlling the translation. — Instruction fetches. • For 32-bit paging or if IA32_EFER.
Vol. 3 4-35 PAG I NG is 1; it is 0 if a user-mode (CPL = 3) access did so. This flag describes the access causing the page-fault exception, not the access rights specified by paging.
4-36 Vol. 3 PAG ING Page-fault exceptions occur only due to an attempt to use a linear address. F ailures to load the PDPTE registers with PAE paging (see Section 4.
Vol. 3 4-37 PAG I NG 4.9 PAGING AND MEMORY T YPING The memory ty pe of a memory access refers to the ty pe of caching used for that access. Chapter 11, “Memory Cache Control” provides many details regarding memory typing in the Intel-64 and IA-32 ar chitectures.
4-38 Vol. 3 PAG ING The PA T is a 64-bit MSR (IA32_PA T ; MSR index 277H) comprising eigh t (8) 8-bit entries (entry i comprises bits 8 i +7:8 i of the MSR). For an y access to a physical address, the table combines the memory type specified for that physical address by the MTRRs with a m e mo r y t y p e s e le c t e d f ro m t h e P AT .
Vol. 3 4-39 PAG I NG tively . Section 4.10.3 explains how soft ware can remove inconsistent cached information by inv alidating portions of the TLBs and paging-structure caches. Section 4.10.4 describes special consid erat ions for multipro cessor systems.
4-40 Vol. 3 PAG ING 4.10.1.2 Caching T ranslations in TLBs The processor may acceler ate the paging process by caching individual tr anslations in translation lookaside buffers ( TLBs ). Each entry in a TLB is an individual tr ans- lation. Each translation is referenced by a page number .
Vol. 3 4-41 PAG I NG entries in memory . See Section 4.10.3.2 for how software can ensure that the processor uses the modified paging-structure entries. If the paging structures specify a translatio n using a page larger than 4 KBytes, some processors may choose to cache multiple smaller -page TLB entries for that transla- tion.
4-42 Vol. 3 PAG ING — The value of the R/W flag of the PML4E. — The value of the U/S flag of the PML4E. — The value of the XD flag of the PML4E. — The values of the PCD and PWT flags of the PML 4E.
Vol. 3 4-43 PAG I NG — The processor may create a PDPTE-cache entry even if there are no transla- tions for any linear address that might use that entry . — If the processor creates a PDPTE-cache entry , the processor may retain it unmodified even if software subsequent ly modifie s the corresponding PML4E or PDPTE in memory .
4-44 Vol. 3 PAG ING For example, if the R/W flag is 0 in a PML4 E, then the R/W flag will be 0 in any PDPTE- cache entry for a PDPTE from the page-directory-pointer table reference d by that PML4E. This is because the R/W flag of each such PDPTE-cache entry is the logical- AND of the R/W flags in the appropriate PML4E and PDPTE.
Vol. 3 4-45 PAG I NG (Any of the above steps would be skipped if the processor do es not support the cache in question.) If the processor does not find a TLB or pagin g-structure-cache entry for the linear address, it uses the linear address to trav er se the entire paging-structure hierarch y , as described in Section 4.
4-46 Vol. 3 PAG ING 4.10.3 In validation o f TLBs and Paging-S tructure Caches As noted in Section 4.10.1 and Section 4.10.2, the processor may create entries in the TLBs and the paging-structure caches when linear addresses are translated, and it may retain these entries even after the pa ging structures used to create them have been modified.
Vol. 3 4-47 PAG I NG In addition to the instructions identified above, page faults invalidate entries in the TLBs and paging-structure caches. In p articular , a page-fault exception resulting from an.
4-48 Vol. 3 PAG ING • If software using P AE paging modifies a PDPTE, it should reload CR3 with the register’s cu rrent valu e to ensure that the modified PDPTE is loaded into the corresponding PDPTE register (see Section 4.
Vol. 3 4-49 PAG I NG in response to an attempted user-mode access) but no other adverse behavior . Such an exception will occur at most once for each affected linear address (see Section 4.
4-50 Vol. 3 PAG ING TLB shootdown algorithm for processors supporting the Intel-64 and IA-32 architec- tures: 1. Begin barrier: Stop all but one logical processor; that i s, cause all but one to execute the HL T instruction or to enter a spin loop. 2.
Vol. 3 4-51 PAG I NG 4.11 INTER ACTIONS WITH VIRTUAL-MACHINE EXTENSIONS (VMX) The architecture for virtual-machine extensio ns (VMX) includes features that interact with paging. Section 4.11.1 discusses ways in which VMX -specific control transfers, called VMX transitions specially affect pagi ng.
4-52 Vol. 3 PAG ING concurrently information for multiple addre ss spaces in its TLBs and paging-structure caches. See Section 25.1 for details. When EPT is in use, the addresses in the paging-structures are not used as physical addresses to access memory and memory-mapped I/O.
Vol. 3 4-53 PAG I NG segments can be mapped to pages in several w ays. T o implement a flat (unseg- mented) addressing environment, for exampl e, all the code, data, and stack modules can be mapped to one or more large segments (up to 4-GBytes) that share same range of linear addresses (see Figure 3-2 in Section 3.
4-54 Vol. 3 PAG ING.
Vol. 3 5-1 CHAP TER 5 PR O TECTION In protected mode, the Intel 64 and IA -32 architectures provide a protection mecha - nism that operates at both the segment level and the page level.
5-2 Vol. 3 PRO TECTION there is no control bit for turn ing the protection mechanism on or off . The part of the seg men t -p rot ec tio n m ech an ism that is based on privil ege levels can essen tia.
Vol. 3 5-3 PRO TECTION procedure. The term current privilege leve l (CPL) refers to the setting of this field. • User/supervisor (U/ S) flag — (Bit 2 of paging-structure entries.) Determines the type of page: user or supervisor . • Read/write (R/W) flag — (Bit 1 of paging-structure entries.
5-4 Vol. 3 PRO TECTION Many different styles o f protection schem es can be implemented with these fields and flags. When the operating system creates a descriptor , it places values in these fields and flags in keeping with the particul ar protection style chosen for an operating system or executive.
Vol. 3 5-5 PRO TECTION The following sections describe how the processor uses these fields and flags to perform the various categories of checks descr ibed in the introduction to this chapter .
5-6 Vol. 3 PRO TECTION 5.3 LIMIT CHECKING The limit field of a segment descriptor prevents program s or procedures from addressing memory locations outside the se gment. The effective value of the limit depends on the setting of the G (granularity) flag (see Figure 5-1 ).
Vol. 3 5-7 PRO TECTION • A doubleword at an offset greater than the (effective-limit – 3) • A quadword at an offset greater than the (effective-limit – 7) F or expand-down data segments, the segment limit has the same function but is interpreted differently .
5-8 Vol. 3 PRO TECTION The processor examines type information at various times while operating on segment selectors and segment descriptors . The following list gives examples of typical operations w.
Vol. 3 5-9 PRO TECTION instruction. If the descriptor type is for a code segment or call gate, a call or jump to another code segment is indicate d; if the descrip tor type is for a TSS or task gate, a task switch is indicated.
5-10 Vol. 3 PRO TECTION The processor uses privilege levels to prevent a program or task operating at a lesser privilege level from accessing a segment with a gre ater privilege, except under controlled situations. When the processor detects a privilege level violation, it gener - ates a general-protection ex ception (#GP).
Vol. 3 5-11 PRO TECTION e x a m p l e , i f t h e D P L o f a d a t a s e g m e n t i s 1 , o n l y p r o g r a m s r u n n i n g a t a C P L o f 0 or 1 can access the segment. — Nonconforming code segment (without using a call gate) — The DPL indicates the privilege level that a progr am or task must be at to access the segment.
5-12 Vol. 3 PRO TECTION loads the segment selector into the segme nt register if the DPL is numeric ally greater than or equal to both the CPL and the RPL.
Vol. 3 5-13 PRO TECTION As demonstrated in the previous examples , the addressable dom ain of a program or task varies as its CPL changes. When the CPL is 0, data segments at all privile ge levels are.
5-14 Vol. 3 PRO TECTION • Load a data-segment register with a segment se lector for a nonconforming, readable, code segment. • Load a data-segment register with a segment se lector for a conforming, readable, code segment.
Vol. 3 5-15 PRO TECTION • The target operand points to a T SS, which contains the segment selector for the target code segment. • The target operand points to a task gate, which points to a TSS, which in turn contains the segment selector for the target code segment.
5-16 Vol. 3 PRO TECTION • The RPL of the segment selector of the destination code segment. • The conforming (C) flag in the segment descriptor for the destination code segment, which determines whether the segm ent is a conforming (C flag is set) or nonconforming (C flag is clear) code segment.
Vol. 3 5-17 PRO TECTION The RPL of the segment selector that poin ts to a nonconforming code segment has a limited effect on the privilege check. The RPL must be numerically less than or equal to the CPL of the calling procedure for a successful control tr ansfer to occur .
5-18 Vol. 3 PRO TECTION In the example in Figure 5-7, code segment D is a conforming code segment. There - fore, calling procedures in both code segment A and B can access code segment D (using either segment selector D1 or D2, re spectively), because they both have CPLs that are greater than or equal to the DPL of the conforming code segment.
Vol. 3 5-19 PRO TECTION 5.8.3 Call Gates Call gates facilitate controlled transfers of program control between different privi - lege levels. They are typically used only in operating systems or executives that use the privilege-level protection mechanism.
5-20 Vol. 3 PRO TECTION Note that the P flag in a gate descriptor is norm a l l y a l w a y s s e t t o 1 . I f i t i s s e t t o 0 , a not present (#NP) exception is generated when a program attempts to access the descriptor . The operating system can us e the P flag for special purposes.
Vol. 3 5-21 PRO TECTION • T arget code segme nts referenced by a 64-bit call gate must be 6 4-bit code segments (CS.L = 1, C S.D = 0). If not, the ref erence generates a general- protection exception, #GP (CS selector). • Only 64-bit mode call gates can be reference d in IA-32e mode (64-bit mode and compatibility mode).
5-22 Vol. 3 PRO TECTION 5.8.4 Accessing a Code Segment Thr ough a Call Gate T o access a call gate, a far pointer to the gate is provided as a target operand in a CALL or JMP instruction.
Vol. 3 5-23 PRO TECTION The privilege checking rules are different depending on whether the con trol transfer was initiated with a CALL or a JMP instruction, as shown in Ta b l e 5 - 1 .
5-24 Vol. 3 PRO TECTION segments B and C. The dotted line shows that a calling procedure in code segment A cannot access call gate B. The RPL of the segment selector to a call gate must satisfy the same test as the CPL of the calling procedure; that is, the RPL must be less than or equal to the DPL of the call gate.
Vol. 3 5-25 PRO TECTION Call gates allow a single code segment to hav e procedures that can be accessed at different privilege levels. For example, an operating system located in a code segment may ha.
5-26 Vol. 3 PRO TECTION Each task must define up to 4 stacks: one for applications code (running at privilege level 3) and one for each of the privilege leve ls 2, 1, and 0 that are used. (If only two privilege levels are used [3 and 0], then on ly two stacks must be defined.
Vol. 3 5-27 PRO TECTION 3. Checks the stack -segment descriptor fo r the proper pr ivileges and type and generates an inv alid TSS (#TS) exception if violations are detected. 4. T e mporarily sa ves the current values of the SS and ESP registers. 5. Loads the segment selector and stack pointer for the new stack in the S S and ESP registers.
5-28 Vol. 3 PRO TECTION dure, one of the par ameters can be a pointer to a data structure, or the sa ved contents of the SS and ESP registers may be used to access parameters in the o ld stack space. The size of the data items passed to the called procedure depends on the call gate size, as described in Section 5.
Vol. 3 5-29 PRO TECTION intended to execute returns from procedur es that were called with a CALL instruc - tion. It does not support returns from a JMP instruction, because the JMP instruction does not save a return instruction pointer on the stack.
5-30 Vol. 3 PRO TECTION 5. (If the RET instruction includes a para meter count operand.) Adds the parameter count (in bytes obtained from the RET instruction) to the current ESP register value, to step past the parameters on the calling procedure’ s stack.
Vol. 3 5-31 PRO TECTION • Stack segment — Computed b y adding 24 to the value in IA32_SYSENTER_CS. • Stack pointer — Reads this from ECX. The SYSENTER and SYSEXIT instructions pr eform “fast.
5-32 Vol. 3 PRO TECTION When SYSEXIT transfers control to compatibility mode user code when the operand size attribute is 32 bits, the following fields are generated and bits set: • Target code segment — Computed by adding 16 to the v alue in IA32_SYSENTER_CS.
Vol. 3 5-33 PRO TECTION When SYSRET transfers control to 32-bit mode user code using a 32-bit operand size, the processor gets the privilege level 3 target instruction and stack pointer from: • Target code segment — Reads a non-NULL selector from IA32_ST AR[63:48].
5-34 Vol. 3 PRO TECTION general-protection exception (#GP) is gene rated. The following system instructions are privileged instructions: • LGD T — Load GD T register . • LLDT — Load LDT register . • L TR — Load task register . • LIDT — Load ID T register .
Vol. 3 5-35 PRO TECTION The processor automatically performs first, second, and third checks during instruc - tion execution. Software must explicitly re quest the fourth check by issuing an ARPL instruction. The fifth check (offset alignmen t) is performed automatically at privilege level 3 if alignment checking is turned on.
5-36 Vol. 3 PRO TECTION 5.10.2 Checking R ead/Write Rights (V ERR and VERW Ins tructions) When the processor accesses any code or data segment it checks the read/write priv - ileges assigned to the segment to verify that the inte nded read or w rite oper ation is allowed.
Vol. 3 5-37 PRO TECTION destination register and sets the ZF flag in the EFLAGS reg ister . If the segment selector is not visible at the current privile ge level or is an in valid type for the LSL instruction, the instruction does not modify the destination register and clears the ZF flag.
5-38 Vol. 3 PRO TECTION Now assume that instead of setting the RPL of the segment selector to 3, the appli - cation program sets the RPL to 0 (segment selector D2). The operating system can now access data segment D , because its CPL and the RPL of segment selector D2 are both equal to the DPL of data segment D .
Vol. 3 5-39 PRO TECTION The example in Figure 5-15 demonstrates how the ARPL instruction is intended to be used. When the operating-system receives segment selector D2 from the application program, it.
5-40 Vol. 3 PRO TECTION page-fault exception mechanism. This chapter describes the protection violations which lead to page-fault exceptions. 5.11.1 Page-Pr otection Flags Protection information for pages is contained in two flags in a paging-structure entry (see Chapter 4 ): the read/write flag (bit 1) and the user/supervisor flag (bit 2).
Vol. 3 5-41 PRO TECTION When the processor is in supervisor mode and the WP flag in register CR0 is clear (its state following reset initialization), all pages are both readable and writable (write- protection is ignored). When the processor is in user mode, it can write only to user- mode pages that are read/write accessible.
5-42 Vol. 3 PRO TECTION exception is genera ted. If an exception is genera ted by segmentation, no paging exception is gener ated. Page-level protections cannot be used to override segment-lev el protection. For example, a code segment is by definition not writable.
Vol. 3 5-43 PRO TECTION 5.13 PAGE-L E VEL PR O TECTION AND EX ECUTE-DISABLE BIT In addition to page-level protection offe red by the U/S and R/W flags, paging struc - tures used with PAE paging and IA-32e paging (see Chapter 4 ) provide the execute- disable bit.
5-44 Vol. 3 PRO TECTION 5.13.2 Ex ecute-Disable P age Pro tection The execute-disable bit in the pag ing structures enhances page protection for data pages. Instructions cannot be fetched from a memory page if IA32_EFER.NXE =1 and the execute-disable bit is set in an y of the paging-structure entries used to map the page.
Vol. 3 5-45 PRO TECTION 5.13.3 R eserved Bit Checking The processor enforces reserved bit checking in paging data structure entries. The bits being checked v aries with paging mode a n d m a y va r y w i t h th e s i ze o f p hy s i c a l address space.
5-46 Vol. 3 PRO TECTION If execute disable bit capability is not enable d or not av ailable, reserved bit checking in 64-bit mode includes bit 63 and additional bits. This and reserved bit checking for legacy 32-bit paging modes are shown in T able 5-10 .
Vol. 3 5-47 PRO TECTION 5.13.4 Ex cep tion Handling When execute disable bit capability is enabled (IA32_EFER.NXE = 1), conditions for a page fault to occur include the same condit ions that apply to .
5-48 Vol. 3 PRO TECTION.
Vol. 3 6-1 CHAP TER 6 INTERRUP T AND EXC EP TION HANDLING This chapter describes the interrupt an d exception-handling mechanism when oper - ating in protected mode on an Intel 6 4 or IA -32 processor . Most of the information provided here also applies to interrupt and exception mechanisms used in real- address, virtual-8086 mode, and 64-bit mode.
6-2 Vol. 3 INTERRUP T AND EX CEPTION HANDLING 6.2 EXC EPTION AND IN TERRUPT V ECTORS T o aid in handling exceptions and interrupt s, each architecturally defined exception and each interrupt condition requiring special handling by the processor is assigned a unique identification number , called a vector .
Vol. 3 6-3 INTERRUP T AND EXCEP TION HANDLING (see Section 6.2, “Exception and Interrupt V ectors” ). Asserting the NMI pin signals a non-maskable interrupt (NMI), which is assi gned to interrupt vector 2. T able 6-1. Pr otected -Mode Exc eptions and Inter rupts Ve c t o r No.
6-4 Vol. 3 INTERRUP T AND EX CEPTION HANDLING The processor’s local APIC is normally co nnected to a system-based I/O APIC. Here, external interrupts received at the I/O APIC’ s pins can be direct.
Vol. 3 6-5 INTERRUP T AND EXCEP TION HANDLING defined interrupt vectors from 0 through 255; those that can be delivered through the local APIC include inte rrupt vectors 16 through 255. The IF flag in the EFLAGS register permits all maskable hardware interrupts to be masked as a group (see Section 6.
6-6 Vol. 3 INTERRUP T AND EX CEPTION HANDLING 6.4.2 Softw are-Gener ated Exc eptions The INTO , INT 3, and BOUND instructions pe rmit exceptions to be generated in soft - ware. These instructions allow checks for exception conditions to be performed at points in the instruction stre am.
Vol. 3 6-7 INTERRUP T AND EXCEP TION HANDLING • Aborts — An abort is an exception that does not always report the precise location of the instruction causing the exception and does not allow a restart of the progra m or task that caused the except ion.
6-8 Vol. 3 INTERRUP T AND EX CEPTION HANDLING EFLAGS.OF (ov e rflow) flag. The tr ap handler for this exception resolves the ov erflow condition. Upon return from the trap handler , program or task execution continues at the instruction following the INT O instruction.
Vol. 3 6-9 INTERRUP T AND EXCEP TION HANDLING It is possible to issue a maskable hardware interrupt (through the INTR pin) to vector 2 to invok e the NMI interrupt handler; however , this interrupt will not truly be an NMI interrupt.
6-10 Vol. 3 INTERRUP T AND EX CEPTION HANDLING is an interrupt. As with the INT n instruction (see Section 6.4.2, “S oftware-Generated Exceptions” ), when an interrupt is generated through the INTR pin to an exception vector , the processor doe s not push an error code on the stack, so the exception handler may not operate correctly .
Vol. 3 6-11 INTERRUP T AND EXCEP TION HANDLING 6.8.3 Masking Ex cep tions and Interrup ts When S witching S tacks T o switch to a different stack segment, software often uses a pair of instructions, f.
6-12 Vol. 3 INTERRUP T AND EX CEPTION HANDLING While priority among these classes listed in T able 6-2 is consistent throughout the architecture, exceptions within each cl ass are implementation-dependent and may vary from processor to processor .
Vol. 3 6-13 INTERRUP T AND EXCEP TION HANDLING protected mode). Unlike the GDT , the first entry of the IDT may contain a descriptor . T o form an index into the IDT , the process or scales the exception or interrupt vector by eight (the number of bytes in a gate de scriptor).
6-14 Vol. 3 INTERRUP T AND EX CEPTION HANDLING 6.11 IDT DESCRIPTORS The ID T may contain an y of three kinds of gate descriptors: • T ask -gate descriptor • Interrupt-gate descriptor • T rap-gate descriptor Figure 6-2 shows the formats for the task -gate, interrupt-gate, and tr ap-gate descriptors.
Vol. 3 6-15 INTERRUP T AND EXCEP TION HANDLING 6.12 EX CEP TION AND IN TERRUPT HANDLING The processor handles calls to exception- and interrupt -handlers similar to the way it handles calls with a CALL instruction to a procedure or a task.
6-16 Vol. 3 INTERRUP T AND EX CEPTION HANDLING “Returnin g from a Called Procedure” ). If index points to a task gate, the processor executes a task switch to the exception- or interrupt-handler task in a manner similar to a CALL to a task gate (see Section 7.
Vol. 3 6-17 INTERRUP T AND EXCEP TION HANDLING When the processor performs a call to the exception- or interrupt-handler procedure: • If the handler procedure is going to be ex ecuted at a numerically lower privilege level, a stack switch occurs. When the stack switch occurs: a.
6-18 Vol. 3 INTERRUP T AND EX CEPTION HANDLING T o return from an exception- or interrupt-handler procedure, the handler must use the IRET (or IRETD) instruction. The IRET in struction is similar to the RET instruction except that it restores the saved flags into the EFLAGS register .
Vol. 3 6-19 INTERRUP T AND EXCEP TION HANDLING not permit transfer of ex ecution to an exce ption- or interrupt-handler procedure in a less privileged code segment (numerically greater privilege level) than the CPL. An attempt to violate this rule results in a gener al-protection exception (#GP).
6-20 Vol. 3 INTERRUP T AND EX CEPTION HANDLING of the EFLAGS register on the stack. Accessing a handler procedure through a trap gate does not affect the IF flag. 6.12.2 Interrup t T asks When an exception or interrupt handler is a ccessed through a task gate in the IDT , a task switch results.
Vol. 3 6-21 INTERRUP T AND EXCEP TION HANDLING 6.13 ERR OR CODE When an exception condition is related to a specific segment, the processor pushe s an error code onto the stack of the ex cept ion handler (whether it is a procedure o r task). The error code has the format shown in Figure 6-6 .
6-22 Vol. 3 INTERRUP T AND EX CEPTION HANDLING clear , indicates that the index refers to a descriptor in the GD T or the current LDT . TI GDT/LDT (bit 2) — Only used when the IDT fl ag is clear .
Vol. 3 6-23 INTERRUP T AND EXCEP TION HANDLING • The stack pointer (SS:RSP) is pushed unconditionally on interrupts. In legacy modes, this push is conditional and base d on a change in current privilege level (CPL). • The new SS is set to NULL if there is a change in CPL.
6-24 Vol. 3 INTERRUP T AND EX CEPTION HANDLING ware attempts to reference an interrupt gate with a target RIP th at is not in canonical form. The target code segment re ferenced by th e interrupt gate must be a 64-bit code segment (CS.L = 1, CS.D = 0).
Vol. 3 6-25 INTERRUP T AND EXCEP TION HANDLING 6.14.3 IR ET in IA-32e Mode In IA -32e mode, IRET ex ecutes with an 8-byte op erand siz e. There is nothing that forces this requirement. The stack is formatted in such a way that for actions where IRET is required, the 8-byte IRET operand size works correctly .
6-26 Vol. 3 INTERRUP T AND EX CEPTION HANDLING In summary , a stack switch in IA-32e mode works like the legacy stack switch, except that a new SS selector is not loaded from the TS S.
Vol. 3 6-27 INTERRUP T AND EXCEP TION HANDLING 6. 1 5 EXCE PT IO N A ND I NT ER R U PT REFE RE NC E The following sections describe conditions which generate exceptions and interrupts.
6-28 Vol. 3 INTERRUP T AND EX CEPTION HANDLING Interrup t 0—Divide Err or Ex cep tion (#DE) Ex ception Class Fa u l t . Descripti on Indicates the divisor operand for a DIV or IDIV instruction is 0 or that the re sult cannot be represented in the number of bi ts specified for the de stination operand.
Vol. 3 6-29 INTERRUP T AND EXCEP TION HANDLING Int errupt 1—Debu g Ex cep tion (#DB) Exc eption Class Tr ap or F ault. The ex ception handler can distinguish betw een traps or faults b y examining the c onten ts of DR6 and the other debug r egisters.
6-30 Vol. 3 INTERRUP T AND EX CEPTION HANDLING Interrup t 2—NMI In terrup t Exc ep tion Class Not applicable. Descripti on The nonmaskable interrupt (NMI) is ge nerated externally by asserting the processor’s NMI pin or through an NMI reques t set by the I/O APIC to the local APIC.
Vol. 3 6-31 INTERRUP T AND EXCEP TION HANDLING Int er r upt 3— Br ea kp oi nt Exce pti on ( #B P) Exc eption Class Tr a p . Description Indicates that a breakpoint instruction (INT 3) w as executed, causing a breakpoint trap to be generated.
6-32 Vol. 3 INTERRUP T AND EX CEPTION HANDLING Interrup t 4—Ov erflow Ex c eption (#OF) Exc ep tion Class Tr a p . Descripti on Indicates that an overflow tr ap occurred when an INT O instruction was executed. The INTO instruction checks the state of the OF flag in the EFLAGS register .
Vol. 3 6-33 INTERRUP T AND EXCEP TION HANDLING Interrup t 5—BOUND Range Ex ceeded Ex cep tion (#BR) Exc eption Class Fa u l t . Description Indicates that a BOUND-range-ex ceeded fault occurred when a BOUND instruction was executed.
6-34 Vol. 3 INTERRUP T AND EX CEPTION HANDLING Interrup t 6—In valid Opc ode Ex cep tion (#UD) Exc ep tion Class Fa u l t . Descripti on Indicates that the processor did one of the following things: • Attempted to execute an in valid or reserv ed opcode.
Vol. 3 6-35 INTERRUP T AND EXCEP TION HANDLING processor and earlier IA-32 processors, this exception is not gene rated as the result of prefetching and preliminary decodi ng of an inv alid instruction. (See Section 6.5, “Exception Classifications, ” for general rules for taking of interrupts and exceptions.
6-36 Vol. 3 INTERRUP T AND EX CEPTION HANDLING Interrup t 7—De vice No t A vailable Ex cep tion (#NM) Exc ep tion Class Fa u l t . Descripti on Indicates one of the following things: The device-not-.
Vol. 3 6-37 INTERRUP T AND EXCEP TION HANDLING Saved Ins truction Poin ter The saved contents of CS and EIP registers point to the floating-point instruction or the WAIT/FW AIT instruction that generated the ex ception.
6-38 Vol. 3 INTERRUP T AND EX CEPTION HANDLING Interrup t 8—Double F ault Exc eption (#DF) Exc ep tion Class Abort. Descripti on Indicates that the processor detected a second exception while calling an exception handler for a prior exception.
Vol. 3 6-39 INTERRUP T AND EXCEP TION HANDLING A segment or page fault may be encountered while prefetching instructions; however , this behavior is outside the domain of T able 6-5 . Any further faults gener - ated while the processor is attempting to tr ansfer control to the appropriate fault handler could still lead to a double-fault sequence.
6-40 Vol. 3 INTERRUP T AND EX CEPTION HANDLING If the double fault occurs when any port ion of the exception handling machine state is corrupted, the handler cannot be invoked and the processor must be rese t.
Vol. 3 6-41 INTERRUP T AND EXCEP TION HANDLING Interrup t 9—Copr ocessor Segment Ov errun Exc eption Class Ab ort. (Intel r eserved; do no t use. Rec ent IA-32 pr ocessors do no t generate this e x cep tion.
6-42 Vol. 3 INTERRUP T AND EX CEPTION HANDLING Interr upt 10—In valid TSS Ex cep tion (#TS) Exc ep tion Class Fa u l t . Descripti on Indicates that there was an error related to a TSS. Such an error might be detected during a task switch or during the ex ecution of instructions that use information from a TSS.
Vol. 3 6-43 INTERRUP T AND EXCEP TION HANDLING S tack segment selector inde x The stack segment selector exceeds descrip tor table limit. S tack segment selector inde x The stack segment selector is N ULL. S tack segment selector inde x The stack segment descriptor is a non-data segment.
6-44 Vol. 3 INTERRUP T AND EX CEPTION HANDLING This exception can generated either in the context of the original task or in the context of the new task (see Section 7.3, “T ask Switching” ). Until the processor has completely verified the presence of the ne w TSS , the exception is ge nerated in the context of the original task.
Vol. 3 6-45 INTERRUP T AND EXCEP TION HANDLING If an inv alid TSS exception occurs during a task switch, it can occur before or after the commit-to-new-task point.
6-46 Vol. 3 INTERRUP T AND EX CEPTION HANDLING Interrup t 11—Segmen t No t Presen t (#NP) Exc ep tion Class Fa u l t . Descripti on Indicates that the present flag of a segment or gate descriptor is clear .
Vol. 3 6-47 INTERRUP T AND EXCEP TION HANDLING tors for the segment selectors in a new TS S, the CS and EIP registers point to the first instruction in the new task.
6-48 Vol. 3 INTERRUP T AND EX CEPTION HANDLING Interrup t 12—S tack F ault Exc ept ion (#SS) Exc ep tion Class Fa u l t . Descripti on Indicates that one of the following stack related conditions was detected: • A limit violation is detected during an oper ation that refers to the SS register .
Vol. 3 6-49 INTERRUP T AND EXCEP TION HANDLING Progr am S tate Change A program-state change does not generally accompany a stack -fault exception, because the instruction that gener ated the fault is not executed. Here, the instruction can be restarted after the exception handler has corrected the stack fault condition.
6-50 Vol. 3 INTERRUP T AND EX CEPTION HANDLING Interrup t 13—Gener al Pr o tection Ex cep tion (#GP) Exc ep tion Class Fa u l t . Descripti on Indicates that the processor detected one of a class of protection violations called “general-protection violations.
Vol. 3 6-51 INTERRUP T AND EXCEP TION HANDLING • Loading the CR0 register with a se t NW flag and a clear CD flag. • Referencing an entry in the ID T (following an interrupt or exception) that is not an interrupt, trap , or task gate.
6-52 Vol. 3 INTERRUP T AND EX CEPTION HANDLING • A selector from a TSS involved in a task switch. • IDT ve ctor number . Saved Ins truction Poin ter The saved contents of CS and EIP registers point to the instruction that gener ated the excep tion.
Vol. 3 6-53 INTERRUP T AND EXCEP TION HANDLING • If the segment descriptor poin ted to by the segment selector in the destination operand is a code segment an d it has both the D-bit an d the L -bit set. • If the segment descriptor from a 64-bit call gate is in non-canonical space.
6-54 Vol. 3 INTERRUP T AND EX CEPTION HANDLING Interrup t 14—P age-F ault Excep tion (#PF) Exc ep tion Class Fa u l t . Descripti on Indicates that, with paging enabled (the PG flag in the CR0 regis.
Vol. 3 6-55 INTERRUP T AND EXCEP TION HANDLING — The U/S flag indicates whether the processor was ex ecuting at user mode (1) or supervisor mode (0) at the time of the exception. — The RSVD flag indicates that the processo r detected 1s in reserved bits of the page directory , when the PSE or PAE flags in control register CR4 are set to 1.
6-56 Vol. 3 INTERRUP T AND EX CEPTION HANDLING second page fault can occur . 1 If a page fault is caused by a page-le vel protection violation, the access flag in the page-direc tory entry is set when the fault occurs.
Vol. 3 6-57 INTERRUP T AND EXCEP TION HANDLING description for “Interrupt 10—Inv alid TSS Exception (#TS)” in this chapter for addi - tional information on how to handle this situation.
6-58 Vol. 3 INTERRUP T AND EX CEPTION HANDLING Interrup t 16—x87 FPU Floa ting-P oint Err or (#MF) Exc ep tion Class Fa u l t . Descripti on Indicates that the x87 FPU has detected a fl oating-point error . The NE flag in the register CR0 must be set for an interrupt 16 (floating-point error exception) to be gener ated.
Vol. 3 6-59 INTERRUP T AND EXCEP TION HANDLING Prior to executing a waiting x87 FPU instruction or the WAIT/FW AIT instruction, the x87 FPU checks for pending x87 FPU floating-point exceptions (as described in step 2 above).
6-60 Vol. 3 INTERRUP T AND EX CEPTION HANDLING Interrup t 17—Alignmen t Check Ex cep tion (#A C) Exc ep tion Class Fa u l t . Descripti on Indicates that the processor detected an unaligned memory oper and when alignment checking was enabled.
Vol. 3 6-61 INTERRUP T AND EXCEP TION HANDLING • AC flag in th e EFLAGS reg ister is set. • The CPL is 3 (protected mode or virtual-8086 mode). Alignment-check exceptions (#AC) are gene rated only when oper ating at privilege level 3 (user mode).
6-62 Vol. 3 INTERRUP T AND EX CEPTION HANDLING Interrup t 18—Machine-Check Ex cep tion (#MC) Exc ep tion Class Abort. Descripti on Indicates that the processor detected an internal machine error or a bus error , or that an external agent detected a bus error .
Vol. 3 6-63 INTERRUP T AND EXCEP TION HANDLING For the P entium 4, Intel X eon, P6 family , and Pentium processors, a progr am-state change alwa ys accompanies a machine-check exception, and an abort class excep - tion is generated.
6-64 Vol. 3 INTERRUP T AND EX CEPTION HANDLING Interrup t 19—SIMD Floating -P oint Ex ception (#XM) Exc ep tion Class Fa u l t . Descripti on Indicates the processor has detected an SSE/SSE2/SSE3 SIMD floating-point excep - tion.
Vol. 3 6-65 INTERRUP T AND EXCEP TION HANDLING Note that because SIMD floating-point exceptions are precise and occur immediately , the situation does not arise where an x87 FP U instruction, a WAIT/FW AIT instruction, or another SSE/SSE2/SSE3 instruction will catch a pending unmask ed SIMD floating- point exception.
6-66 Vol. 3 INTERRUP T AND EX CEPTION HANDLING Saved Ins truction Poin ter The saved contents of CS and EIP registers point to the SSE/SSE2/SSE3 instruction that was executed when the SIMD floating-point exception w as generated. This is the faulting instruction in which th e error condition was detected.
Vol. 3 6-67 INTERRUP T AND EXCEP TION HANDLING Interrup ts 32 to 255—User Defined In terrupts Exc eption Class Not applicable. Description Indicates that the processor did one of the following things: • Executed an INT n instruction where the instruction operand is one of the vector numbers from 32 through 255.
6-68 Vol. 3 INTERRUP T AND EX CEPTION HANDLING.
Vol. 3 7-1 CHAP TER 7 TA S K M A N A G E M E N T This chapter describes the IA -32 architecture’ s task management facilities. These facilities are only available when the pr ocessor is running in protected mode. This chapter focuses on 32-bit tasks and the 32-bit TSS structure.
7-2 Vol. 3 T ASK MA NAGEMENT 7 .1.2 T ask S tate The following ite ms define the state of the currently executing task: • The task’ s current execution space, defined by the segment selectors in the segment registers (CS, DS, SS, ES, FS, and GS). • The state of the general-purpose registers.
Vol. 3 7-3 T ASK MANAGEME NT 7 .1.3 Ex ecuting a T ask Software or the processor can dispatch a task for execution in one of the following ways: • A explicit call to a task with the CALL instruction. • A explicit jump to a task with the JMP instruction.
7-4 Vol. 3 T ASK MA NAGEMENT page tables as other privilege-level-3 tasks can access code and corrupt data and the stack of other tasks. Use of task management facilities for handlin g multitasking applications is optional.
Vol. 3 7-5 T ASK MANAGEME NT The processor updates dynamic fields when a task is suspended during a task switch. The following are dynamic fields: • General-purpose re gister fields — State of the EAX, ECX, EDX, EBX, ESP , EBP , ESI, and EDI registers prior to the task switch.
7-6 Vol. 3 T ASK MA NAGEMENT • EIP (instruction pointer) field — State of the EIP register prior to the task switch. • Previous task link field — Contains the segment selector for the TSS of the previous task (updated on a task switch that was initiated by a call, interrupt, or exception).
Vol. 3 7-7 T ASK MANAGEME NT • T ask switches are carried out faster if the pages containing these structures are present in memory before the task switch is initiated. 7 .2.2 TSS Descript or The TSS, like all other segments, is defined by a segment descriptor .
7-8 Vol. 3 T ASK MA NAGEMENT of a TSS. A ttempting to switch to a task whose TSS descriptor has a limit less than 67H generates an inv alid- TSS exception (#TS). A larger limit is required if an I/O permission bit map is included or if the operating system stores additional data .
Vol. 3 7-9 T ASK MANAGEME NT 7 .2.4 T ask Regis ter The task register holds the 16-bit segment selector and the entire segment descriptor (32-bit base address, 16-bit segment limit, and descriptor attributes) for the TSS of the current task (see Figure 2-5 ).
7-10 Vol. 3 T ASK MA NAGEMENT The L TR instruction loads a segment selector (source operand) into the task register that points to a TS S descriptor in the GD T . It then loads the invisible po rtion of the task register with information from the TSS descriptor .
Vol. 3 7-11 T ASK MANAGEME NT 7 .2.5 T ask-Gate Descript or A task -gate descriptor prov ides an indire ct, protected reference to a task (see Figure 7-6 ). It can be placed in the GDT , an LDT , or the IDT . The TSS segment selector field in a task -gate descriptor points to a TSS descriptor in the GD T .
7-12 Vol. 3 T ASK MA NAGEMENT to be handled by handler tasks. When an interrupt or exception vector points to a task gate, the processor switches to the spec ified task. Figure 7-7 illustrates how a task gate in an LDT , a task gate in the GD T , and a task gate in the IDT can all point to the same task.
Vol. 3 7-13 T ASK MANAGEME NT • An interrupt or exception vector points to a task -gate descriptor in the IDT . • The current task executes an IRET when the NT flag in the EFL AGS register is set. JMP , CALL, and IRET instructions, as well as interrupts and exceptions, are all mech - anisms for redirecting a program.
7-14 Vol. 3 T ASK MA NAGEMENT 10. If the task switch was initiated with a CALL instruction, JMP instruction , an exception, or an interrupt, the processor se ts the busy (B) flag in the new task’ s TSS descriptor; if initiated with an IRET in struction, the busy (B) flag is left set.
Vol. 3 7-15 T ASK MANAGEME NT rules control access to a TSS, software does not need to perform explicit privilege checks on a task switch. T able 7-1 shows t he exception conditions that the processor checks for when switching tasks.
7-16 Vol. 3 T ASK MA NAGEMENT The TS (task switched) flag in the control register CR0 is set every time a task switch occurs. System software uses the TS flag to coordinate the actions of floating-point unit when gener ating floating-point exceptions with the rest of the processor .
Vol. 3 7-17 T ASK MANAGEME NT T able 7-2 shows the busy flag (in the TSS segment descriptor), the NT flag, the previous task link field, and TS flag (in control register CR0) during a task switch. The NT flag may be m odified by software ex ecuting at any pr ivilege level.
7-18 Vol. 3 T ASK MA NAGEMENT 7 .4.1 Use o f Busy Flag T o Pr ev ent R ecursive T ask Switching A TSS allows only one context to be sav ed for a task; there fore, once a task is called (dispatched), a recursive (or re-entrant) call to the task would cause the current state of the task to be lost.
Vol. 3 7-19 T ASK MANAGEME NT In a multiprocessing system, additional sy nchronization and serialization operations must be added to this procedure to insure th at the TSS and its segm ent descriptor are both locked when the previous task lin k field is changed and the busy flag is cleared.
7-20 Vol. 3 T ASK MA NAGEMENT and the page tables point to different page s of physical memory , then the tasks do not share physical addresses. With either method of mapping task linea r address spaces, the TSSs for all tasks must lie in a shared area of the physical sp ace, which is accessible to all tasks.
Vol. 3 7-21 T ASK MANAGEME NT shared LDT point to segments that are mapped to a common area of the physical address space, the data and code in th ose segments can be shared among the tasks that share the LD T . This method of sharing is more selective than sharing through the GD T , because the sharing can be limited to specific tasks.
7-22 Vol. 3 T ASK MA NAGEMENT 7 .7 T ASK MANAGEMEN T IN 64-BIT MODE In 64-bit mode, task structure and task sta te are similar to those in protected mode. However , the task switching mechanism ava ilable in protected mode is not supported in 64-bit mode.
Vol. 3 7-23 T ASK MANAGEME NT Although hardware task -switching is no t supported in 64-bit mode, a 64-bit task state segment (TSS) must exist. Figure 7-11 shows the format of a 64-bit TS S. The TSS holds information important to 64-bit mode and that is not directly related to th e task -switch mechanism.
7-24 Vol. 3 T ASK MA NAGEMENT Figure 7-11. 64-Bit TSS F ormat 0 31 100 96 92 88 84 80 76 I/O Map Base Address 15 72 68 64 60 56 52 48 44 40 36 32 28 24 20 16 12 8 4 0 RSP0 (lower 32 bits) RSP1 (lower 32 bits) RSP2 (lower 32 bits) Reserved bits. Set to 0.
Vol. 3 8-1 CHAP TER 8 MULTIPLE-PR OCE SSOR MANAGEMENT The Intel 64 and IA -32 architectures provide mechanisms for managing and improving the performance of multiple processors connected to the same system bus. These include: • Bus locking and/or cache coherency management for performin g atomic operations on system memo ry .
8-2 Vol. 3 MULTIPLE-PR OCESSOR MANAGEMENT • T o distribute interrupt handling among a group of processors — When several processors are operating in a system in par a llel, it is useful to have a centr alized mechanism for receiving interrupts and dist ributing them to av ailable processors for servicing.
Vol. 3 8-3 MULTIPLE-PR OCESSOR MANAGE MENT software to manage the fairness of semaphores and exclusive locking functions. The mechanisms for handling locked atom ic operations ha ve evolved with the complexity of IA-32 processors.
8-4 Vol. 3 MULTIPLE-PR OCESSOR MANAGEMENT the hardware designer to make the LOCK# signal av ailable in system hardware to control memory accesses among processors.
Vol. 3 8-5 MULTIPLE-PR OCESSOR MANAGE MENT 8.1.2.2 Software Con trolled Bus L ocking T o explicitly force the LOCK semantics, so ftware can use the L OCK prefix with the following instructions when they are used to modify a memory location.
8-6 Vol. 3 MULTIPLE-PR OCESSOR MANAGEMENT ence weakly o rdered memory ty pes (such as the WC memory type) may not be seri - alized. Locked instructions should not be used to insure that data written can be fetched as instructions.
Vol. 3 8-7 MULTIPLE-PR OCESSOR MANAGE MENT The act of one processor writing data into the currently executing code segment of a second processor with the intent of having the second processor execu te that data as code is called cross-modifying code .
8-8 Vol. 3 MULTIPLE-PR OCESSOR MANAGEMENT have cached the same area of memory from simultaneously modifying data in that area. 8.2 MEMORY ORDERING The term memory or dering refers to the order in which the processor issues reads (loads) and writes (stores) through the system bus to system memory .
Vol. 3 8-9 MULTIPLE-PR OCESSOR MANAGE MENT among processors are expl icitly required to obey program ordering through th e use of appropriate locking or ser ializing operations (see Section 8.2.5, “Strengthening or W eak ening the Memory-Ordering Model” ).
8-10 Vol. 3 MULTIPLE-PR OCESSOR MANAGEMENT • Locked instructions have a total order . See the example in Figure 8-1. Consider three processors in a system and each processor performs three writes, one to each of three defined locations (A, B, and C).
Vol. 3 8-11 MULTIPLE-PR OCESSOR MANAGE MENT 8.2.3 Examples Illustr ating th e Memory-Ordering Principles This section provides a set of examples that illustrate the behavior of the memory- ordering principles introduced in Section 8.
8-12 Vol. 3 MULTIPLE-PR OCESSOR MANAGEMENT Section 8.2.3.2 through Section 8.2.3.7 give examples using the MOV instruction. The principles that underlie these examples apply to load and store accesses in general and to other instructions that load from or store to memory .
Vol. 3 8-13 MULTIPLE-PR OCESSOR MANAGE MENT 8.2.3.3 Stor es Are No t Reor dered With Earlier Loads The Intel-64 memory-ordering model ensures that a store by a processor may not occur before a previous load by the same processor . This is illustrated by the following example: Assume r1 == 1.
8-14 Vol. 3 MULTIPLE-PR OCESSOR MANAGEMENT has the two loads occurring before the two stores. This would result in each load returning valu e 0. The fact that a load may not be reordered with an earli.
Vol. 3 8-15 MULTIPLE-PR OCESSOR MANAGE MENT 8.2.3.6 St ores Ar e T ransitiv ely Visible The memory-ordering model ensures tr ansitive visibility of stores; stores that are causally related appear to all processors to occur in an order consistent with the causality relation.
8-16 Vol. 3 MULTIPLE-PR OCESSOR MANAGEMENT By the principles discussed in Section 8.2.3.2, • processor 2’ s first and second load cannot be reordered, • processor 3’ s first and second load cannot be reordered. • If r1 == 1 and r2 == 0, processor 0’ s store appears to precede processor 1’ s store with respect to processor 2.
Vol. 3 8-17 MULTIPLE-PR OCESSOR MANAGE MENT reader should note that reordering is prevented also if the locked instruction is executed after a load or a store. The first example illustrates that loads ma y not be reordered with earlier locked instructions: As explained in Section 8.
8-18 Vol. 3 MULTIPLE-PR OCESSOR MANAGEMENT 8.2.4 Out-of-Order S tores F or S tring Oper ations The Intel Core 2 Duo, Intel Core, Pentium 4, and P6 family processors modify the processors operation during the string store operations (initiated with the MOVS and ST OS instructions) to maximize performance.
Vol. 3 8-19 MULTIPLE-PR OCESSOR MANAGE MENT 2. Stores from separ ate string oper ations (for example, stores from consecutiv e string operations) do not execute out of orde r . All the stores from an earlier string operation will complete before any store from a later string operation.
8-20 Vol. 3 MULTIPLE-PR OCESSOR MANAGEMENT It is possible for processor 1 to perceive that the repeated string stores in processor 0 are happening out of order .
Vol. 3 8-21 MULTIPLE-PR OCESSOR MANAGE MENT Processor 1 performs two read operations, th e first read is from an address outside the 512-byte block but to be updated by processor 0, the second ready is from inside the block of memory of string operation.
8-22 Vol. 3 MULTIPLE-PR OCESSOR MANAGEMENT 8.2.5 Str engthening or W eak ening the Memory-Ordering Model The Intel 64 and IA-32 architectures provide sever al mechanisms for strengthening or weakenin g the memory -ordering model to handle special programming situations.
Vol. 3 8-23 MULTIPLE-PR OCESSOR MANAGE MENT as the XCHG instruction or the LOCK prefix to insure that a read-modify-write opera - tion on memory is carried out atomically .
8-24 Vol. 3 MULTIPLE-PR OCESSOR MANAGEMENT The PA T was introduced in the Pentium III processor to enhance the caching charac - teristics that can be assigned to pages or groups of pages.
Vol. 3 8-25 MULTIPLE-PR OCESSOR MANAGE MENT • Non-privileged serializing instructions — CPUID, IRET , and RSM. When the processor serializes instruction execution, it ensures that all pending memory transactions are completed (including writes stored in its store buffer) before it executes the next instruction.
8-26 Vol. 3 MULTIPLE-PR OCESSOR MANAGEMENT execution is not deterministically serialized when a branch instruction is executed. 8.4 MULTIPLE-PROC ESSOR (MP) INITIALIZATION The IA -32 architecture (beg inning with the P6 family processors) defines a multiple- processor (MP) initialization protocol called the Multiprocessor Specification Version 1.
Vol. 3 8-27 MULTIPLE-PR OCESSOR MANAGE MENT 8.4.1 BSP and AP Pr ocessors The MP initialization protocol defines two classes of processors: the bootstrap processor (BSP) and the application proce ssors (APs). Following a power-up or RESET of an MP system, system hardware dynamically selects one of the processors on the system bus as the BSP .
8-28 Vol. 3 MULTIPLE-PR OCESSOR MANAGEMENT 8.4.3 MP Initialization Pro toc ol Algorithm f or Intel X eon Pr ocessors Following a power-up or RESET of an MP system, the processors in the system execute the MP initialization protocol algori thm to initialize each of the logical proces - sors on the system bus or coherent link domain.
Vol. 3 8-29 MULTIPLE-PR OCESSOR MANAGE MENT • The newly established BSP broadcasts an FIPI message to “all including self , ” which the BSP and APs treat as an end of MP initialization signal. Only the processor with its BSP flag set responds to the FIPI message.
8-30 Vol. 3 MULTIPLE-PR OCESSOR MANAGEMENT SVR EQU 0FEE000F0H APIC_ID EQU 0FEE00020H LVT3 EQU 0FEE00370H APIC_ENABLED EQU 0100H BOOT_ID DD ? COUNT EQU 00H VACANT E QU 00H 8.4.4.1 T ypical BSP Initialization Sequenc e After the BSP and APs have been selected (by means of a h ardware protocol, see Section 8.
Vol. 3 8-31 MULTIPLE-PR OCESSOR MANAGE MENT mode address space (1-MByte space). For example, a vector of 0BDH specifies a start-up memory address of 000BD000H.
8-32 Vol. 3 MULTIPLE-PR OCESSOR MANAGEMENT MOV EAX, 000C46XXH; Load ICR e n coding fr om broadcast SI PI IP ; to all APs into EAX where xx is the vector comput ed in step 8. 16. Waits for the timer interrupt. 17. Reads and evaluates the COUNT v ariable and establishes a processor count.
Vol. 3 8-33 MULTIPLE-PR OCESSOR MANAGE MENT 8.4.5 Identifying L ogical Proc essors in an MP System After the BIOS has completed the MP initialization protocol, each logical processor can be uniquely identified by its local APIC ID.
8-34 Vol. 3 MULTIPLE-PR OCESSOR MANAGEMENT during power-up and initializ ation is 8 bits. Bits 2:1 form a 2-bit ph ysical package identifier (which can also be thought of as a socket identifier). In systems that configure physical processors in clusters, bits 4:3 form a 2-bit cluster ID.
Vol. 3 8-35 MULTIPLE-PR OCESSOR MANAGE MENT 8.5 INTEL ® HYPER-THREADING T ECHNOLOGY AND INTEL ® MULTI-COR E T ECHNOLOGY Intel Hyper- Threading T echnology and Intel multi-core te chnology are extens.
8-36 Vol. 3 MULTIPLE-PR OCESSOR MANAGEMENT number of addressable IDs attributable to processor cores (Y) in the physical package. • Extended Processor Topology Enumer ation parameters for 32-bit APIC ID : Intel 64 processors supporting CPUID le af 0BH will assign unique APIC IDs to each logical processor in the system.
Vol. 3 8-37 MULTIPLE-PR OCESSOR MANAGE MENT During initialization, each logical processor is assigned an APIC ID that is stored in the local APIC ID register for each logical processor .
8-38 Vol. 3 MULTIPLE-PR OCESSOR MANAGEMENT 8.7 INTEL ® HYPER-THR EADING T ECHNOLOGY ARCHITECTUR E Figure 8-4 shows a generalized view of an Intel processor su pporting Intel Hyper- Threading T echnology , using the original In tel Xeon processor MP as an example.
Vol. 3 8-39 MULTIPLE-PR OCESSOR MANAGE MENT 8.7 .1 S tate of the Logical Pr ocessors The following features are part of the archit ectural state of logical processors within Intel 64 or IA -32 processors supporting Intel Hyper- Threading T echnology .
8-40 Vol. 3 MULTIPLE-PR OCESSOR MANAGEMENT • Debug registers (DR0, DR1, DR2, DR3, DR6, DR7) and the debug control MSRs • Machine check global status (IA32_MCG_ST A TUS) and machine check capabilit.
Vol. 3 8-41 MULTIPLE-PR OCESSOR MANAGE MENT gives software a consistent view of memory , independent of the processor on which it is running. See Section 11.11 , “Memory T ype Range R egisters (MTRRs), ” for infor - mation on setting up MTRRs. 8.7 .
8-42 Vol. 3 MULTIPLE-PR OCESSOR MANAGEMENT 8.7 .7 P erforman ce Monitoring Cou nters Performance counters and their companion control MSRs are shared between the logical processors within a processor core for processors based on Intel NetBurst microarchitecture.
Vol. 3 8-43 MULTIPLE-PR OCESSOR MANAGE MENT 8.7 .11 MICROC ODE UPDA TE Resour ces In an Intel processor supporting Intel Hyper- Threading T echnology , the microcode update facilities are shared between the logical processors; either logical processor can initiate an update.
8-44 Vol. 3 MULTIPLE-PR OCESSOR MANAGEMENT As a consequence, the use of the WBINVD instruction can have an impact on interrupt/event response time. • INVD instruction — The entire cache hie rarch y is invalidated without writing back modified data to memory .
Vol. 3 8-45 MULTIPLE-PR OCESSOR MANAGE MENT disabled on a logical processor basis. T ypically , if softw are controlled clock modula - tion is going to be used, the feature must be enabled for all the logical processors within a physical processor and the modulation duty cycle must be set to the same value for each logical processor .
8-46 Vol. 3 MULTIPLE-PR OCESSOR MANAGEMENT 8.8 MULTI-CORE ARCHITECTUR E This section describes the architecture of Intel 64 and IA -32 processors supporting dual-core and quad-core technology .
Vol. 3 8-47 MULTIPLE-PR OCESSOR MANAGE MENT 8.8.3 Perf ormance Monit oring Counters Performance coun ters and their companio n control MSRs are shared between two logical processors sharing a processor core if the processor core supports Intel Hyper- Threading T echnology and is based on Intel NetBurst microarchitecture.
8-48 Vol. 3 MULTIPLE-PR OCESSOR MANAGEMENT provided for each logical processors (see Section 8.7, “Intel ® Hyper- Threading T ech - nology Architecture, ” and Section 8.
Vol. 3 8-49 MULTIPLE-PR OCESSOR MANAGE MENT If the processor supports CPUID leaf 0BH, the 32-bit APIC ID can represent cluster plus several levels of topology within the physical processor package. The exact number of hierarchical levels within a physical processor package must be enumer - ated through CPUID leaf 0BH.
8-50 Vol. 3 MULTIPLE-PR OCESSOR MANAGEMENT 8.9.2 Hierarchical Mapping o f CPUID Extended T opology Leaf CPUID leaf 0BH provides enumeration parame ters for software to identify each hier - archy of the processor topology in a determ inistic manner .
Vol. 3 8-51 MULTIPLE-PR OCESSOR MANAGE MENT For m = 0, m < N, m ++; { cumulative_width[m] = CPUID.(EAX =0BH, ECX= m): EAX[4:0]; } BitWidth[0] = cumu lative_width[0]; For m = 1, m < N, m ++; BitW.
8-52 Vol. 3 MULTIPLE-PR OCESSOR MANAGEMENT T able 8-2 shows the initial APIC IDs for a hypothetical situation with a dual processor system. Each physical package providing two processor cores, and each processor core also supporting Intel Hyper- Threading T echnology .
Vol. 3 8-53 MULTIPLE-PR OCESSOR MANAGE MENT 8.9.3.1 Hiera rchical ID of L ogical Pr ocessors with x2APIC ID T able 8-3 shows an exampl e of possible x2AP IC ID assignments for a dual processor system that support x2APIC.
8-54 Vol. 3 MULTIPLE-PR OCESSOR MANAGEMENT 8.9.4 Algorithm f or Three-Le vel Mappings of APIC_ID Software can gather the initial APIC_IDs for each logical processor supported by the operating system at ru ntime 5 and extract identifiers corresponding to the three levels of sharing topology (package, co re, and SMT).
Vol. 3 8-55 MULTIPLE-PR OCESSOR MANAGE MENT a. Query the right-shift v alue for the SMT level of the topology using CPUID leaf 0BH with ECX =0H as input. The number of bits to shift-right on x2APIC ID (EAX[4:0]) can distinguish different higher-level entities above SMT (e.
8-56 Vol. 3 MULTIPLE-PR OCESSOR MANAGEMENT Example 8-18. Support Routines for De tecting Hardwar e Multi-Threading and Identifying the Relation ships Betw een Package, Cor e and Logical Pr ocessors 1. Detect support for Hard ware Multi-Threading Support in a process or .
Vol. 3 8-57 MULTIPLE-PR OCESSOR MANAGE MENT int DeriveCore_Mask_Offsets (void) { if (!HWMTSupported()) return -1; execute cpuid with eax = 11, ECX = 0; while( ECX[15:8] ) { // leve l type e ncoding is.
8-58 Vol. 3 MULTIPLE-PR OCESSOR MANAGEMENT unsigned char M axLPIDsPerPackage(voi d) { if (!HWMTSupported()) return 1; execute cpuid wi th eax = 1 store returned value of ebx return (unsigned char) ((reg_e bx & NUM_LOGICAL_BITS) >> 16); } b. Find the size o f address space for processo r cores in a ph ysical processor package.
Vol. 3 8-59 MULTIPLE-PR OCESSOR MANAGE MENT // Returns the mask bit wi dth of a bit field fro m the maximum count that bit fi eld can represe nt. // This algorithm does not a ssume ‘address size’ to have a value equal to power of 2.
8-60 Vol. 3 MULTIPLE-PR OCESSOR MANAGEMENT Software must not assume local APIC_ID values in an MP system are consecutive. Non-consecutive local APIC_IDs may b e the result of hardware co nfigurations or debug features implemented in the BIOS or OS.
Vol. 3 8-61 MULTIPLE-PR OCESSOR MANAGE MENT example also depicts a technique to construct a mask to represent the logical processors that reside in the same core . In Example 8-21, the numerical ID value can be obtained from the value extracted with the mask by shifting it right by shift count.
8-62 Vol. 3 MULTIPLE-PR OCESSOR MANAGEMENT using OS specif ic APIs. // Allocate per processor arrays t o store the Package_ID, Core_ID an d SMT_ID for every st arted // processor.
Vol. 3 8-63 MULTIPLE-PR OCESSOR MANAGE MENT PackagePro cessorMask[0] = Proc essorMask; For (ProcessorNum = 1; Processo rNum < NumStartedLPs; ProcessorNum++) { ProcessorMask << = 1; For (i=0; .
8-64 Vol. 3 MULTIPLE-PR OCESSOR MANAGEMENT } if (i == CoreNum) { //Did not match any bucket, start new bucket CoreIDBucket[i] = PackageID[P roce ssorNum] | CoreID [Process orNum]; CoreProcesso rMask[i.
Vol. 3 8-65 MULTIPLE-PR OCESSOR MANAGE MENT 8.10.2 P AUSE Instruction The PAUSE instru ction can improves the performance of processors supporting Intel Hyper- Threading T echnology when executing “spin-wait loops” and other routines where one thread is access ing a shared lo ck or semaphore in a tight polling loop.
8-66 Vol. 3 MULTIPLE-PR OCESSOR MANAGEMENT 8.10.4 MONIT OR/MW AIT Instruction Operating systems usually implement idle loop s to handle thread synch ronization. In a typical idle-loop scenario, there could be sev eral “busy loops” and they would use a set of memory locations.
Vol. 3 8-67 MULTIPLE-PR OCESSOR MANAGE MENT Po wer management related events (such as Thermal Monitor 2 or chipset driven STPCLK# assertion) will not cause the moni tor event pending flag to be cleared. F aults will not cause the monitor event pending flag to be cleared.
8-68 Vol. 3 MULTIPLE-PR OCESSOR MANAGEMENT the two parameters should default to be the same (the size of the monitor triggering area is the same as the system coherence line size). Based on the monitor line sizes returned by the CPUID , the OS should dynamically allocate structures with appropriate padding.
Vol. 3 8-69 MULTIPLE-PR OCESSOR MANAGE MENT JE Get_Lock PAUSE ;Short delay JMP Spin_ Lock Get_Lock: MOV EAX, 1 XCHG EAX, lockvar ;Try to get lock CMP EAX, 0 ;Test if successful JNE Spin_Lock Critical _Section: <critical section code> MOV lockvar, 0 .
8-70 Vol. 3 MULTIPLE-PR OCESSOR MANAGEMENT // C1 handler use s a Halt instruction VOID C1Handler() {S T I HLT } The MONITOR and M WAIT instructions may be consid ered for u se in the C0 id le state loops, if MONITOR and MWAIT are supporte d. Example 8-25.
Vol. 3 8-71 MULTIPLE-PR OCESSOR MANAGE MENT } 8.10.6.3 Halt Idle Logical Pr ocessors If one of two logical processors is idle or in a spin-wait loop of long dur ation, explicitly halt that processor by means of a HL T instruction.
8-72 Vol. 3 MULTIPLE-PR OCESSOR MANAGEMENT { MONITOR WorkQueue // S etup of eax with W orkQueue LinearAd dress, // ECX, EDX = 0 IF (WorkQueue != 0) THEN { STI MWAIT // EAX, EC X = 0 } } 8.
Vol. 3 8-73 MULTIPLE-PR OCESSOR MANAGE MENT • A high resolution timer within the processor (such as, the local APIC timer or the time-stamp counter). For additional information, see the Intel® 64 and IA-32 Architectures Optimization Reference Manual .
8-74 Vol. 3 MULTIPLE-PR OCESSOR MANAGEMENT.
Vol. 3 9-1 CHAP TER 9 PR OCESSOR MANAGEMEN T AND INITIALIZATION This chapter describes the facilities provided for managing processor wide functi ons and for initializing the processor .
9-2 Vol. 3 PRO CESSOR MANAGE MENT AND INITIALIZA TION The software-initialization code performs all system-specific initialization of the BSP or primary processor and the system logic.
Vol. 3 9-3 PROCESSOR MANAGEMEN T AND INITIALIZATION CR2, CR3, CR4 00000000H 00000000H 00000000H CS Select or = F000H Base = FFFF0000H Limit = FFFFH AR = Presen t, R/W, Access ed Selector = F000H Base .
9-4 Vol. 3 PRO CESSOR MANAGE MENT AND INITIALIZA TION LDTR, T ask Reg ist er Selector = 0000H Base = 00000000H Limit = FFFFH AR = Presen t, R/W Selector = 0000H Base = 00000000H Limit = FFFFH AR = Pre.
Vol. 3 9-5 PROCESSOR MANAGEMEN T AND INITIALIZATION 9.1.3 Model and S tepping Inf ormation Following a hardw are reset, the EDX register contains component identification and revision information (see Figure 9-2 ).
9-6 Vol. 3 PRO CESSOR MANAGE MENT AND INITIALIZA TION 9.1.4 First Instruction Ex ecuted The first instruction that is fetched and executed following a hardware reset is located at physical address FFFFFFF0 H. This address is 16 byte s below the processor’s uppermost physical address.
Vol. 3 9-7 PROCESSOR MANAGEMEN T AND INITIALIZATION The EM flag determines whether floating-poi nt instructions are executed by the x87 FPU (EM is cleared) or a device-not-av ailable exception (#NM) is generated for all floating-point instructions so that an exception handler can emulate the floating- point operation (EM = 1).
9-8 Vol. 3 PRO CESSOR MANAGE MENT AND INITIALIZA TION • It allows x87 FPU code to run on an IA-32 processor that has neither an integr ated x87 FPU nor is connected to an external math coprocessor , by using a floating-point emulator .
Vol. 3 9-9 PROCESSOR MANAGEMEN T AND INITIALIZATION 9.4 MODEL-SPECIFIC REGISTERS (MSRS) Most IA-32 processors (starting from P entium processors) and Intel 64 processors contain a model-specific registers (MSRs) . A given MSR may not be supported across all families and models for Intel 64 and IA-32 processors.
9-10 Vol. 3 PRO CESSOR MANAGE MENT AND INITIALIZA TION all the MTRRs must be cleared to 0, which selects the uncached (UC) memory type. See Section 11.
Vol. 3 9-11 PROCESSOR MANAGEMEN T AND INITIALIZATION mode. The protected-mode data structures that must be loaded are described in Section 9.8, “Software Initializatio n for Protected-Mode Operation.
9-12 Vol. 3 PRO CESSOR MANAGE MENT AND INITIALIZA TION modules into memory to support reliable operation of the processor in protected mode. These data structures include the following: • A IDT . • A GDT . • A TSS. • (Optional) An LDT . • If paging is to b e used, at least one page dire ctory and one page table.
Vol. 3 9-13 PROCESSOR MANAGEMEN T AND INITIALIZATION descriptors in the GDT . Some operating systems allocate new segments and LDT s as they are needed. This provides maximum flexibility for handling a dynamic program - ming environment. However , many operating systems use a single LD T for all tasks, allocating GDT entries in advance.
9-14 Vol. 3 PRO CESSOR MANAGE MENT AND INITIALIZA TION 9.8.4 Initializing Multitasking If the multitasking mechanism is not going to be used and changes betwe en privilege levels are not allowed, it is not necessary load a TS S into memory or to initialize the task register .
Vol. 3 9-15 PROCESSOR MANAGEMEN T AND INITIALIZATION following instructions must be located in an identity-mapped page (until such time that a branch to non-identi ty mapped pages can be effected). 64-bit mode paging tables must be located in the first 4 GBytes of physical-address space prior to activating IA -32e mode.
9-16 Vol. 3 PRO CESSOR MANAGE MENT AND INITIALIZA TION 9.8.5.3 64-bit Mode and Compatibility Mode Oper ation IA-32e mode uses two code segment-descrip tor bits (CS.L and CS.D , see Figure 3-8) to control the oper ating modes after IA -32e mode is initialized.
Vol. 3 9-17 PROCESSOR MANAGEMEN T AND INITIALIZATION from 64-bit mode through compatibility mode to legacy or real mode and then back through compatibility mode to 64-bit mode. 9.9 MODE S WITCHING T o use the processor in protected mode af ter hardware or software reset, a mode switch must be performed from real-addr ess mode.
9-18 Vol. 3 PRO CESSOR MANAGE MENT AND INITIALIZA TION 7. If a local descriptor table is going to be used, execute the LLDT instruction to load the segment selector for the LDT in the LDTR register .
Vol. 3 9-19 PROCESSOR MANAGEMEN T AND INITIALIZATION 4. Load segment registers SS, DS, ES, FS, and GS with a selector for a descriptor containing the following values, which are appropriate for real-a.
9-20 Vol. 3 PRO CESSOR MANAGE MENT AND INITIALIZA TION • Load the system registers with the necessa ry pointers to the data structures and the appropriate flag settings for protected-mode operation.
Vol. 3 9-21 PROCESSOR MANAGEMEN T AND INITIALIZATION Figure 9-3. Pr ocessor S t ate A fter Rese t T able 9-4. Main Initialization S t eps in ST AR TUP .
9-22 Vol. 3 PRO CESSOR MANAGE MENT AND INITIALIZA TION 9.10.1 Assembler Usage In this example, the Intel assembler ASM386 and build tools BL D386 are used to assemble and build the initialization code module. The following assumptions are used when using the Intel ASM386 and BLD386 tools.
Vol. 3 9-23 PROCESSOR MANAGEMEN T AND INITIALIZATION 9.10.2 ST AR TUP .ASM Listing Example 9-1 p rovides high- level sample code designed to mo ve the processor into protected mode. This listing does not in clude any opcode and offset information. Example 9-1.
9-24 Vol. 3 PRO CESSOR MANAGE MENT AND INITIALIZA TION 28 ; RAM_START will contain the line ar address of the first 29 ; free byte above the copied table s - this may be useful if 30 ; a memory manager is used.
Vol. 3 9-25 PROCESSOR MANAGEMEN T AND INITIALIZATION 71 SS_reg DW ? 72 SS_h DW ? 73 DS_reg DW ? 74 DS_h DW ? 75 FS_reg DW ? 76 FS_h DW ? 77 GS_reg DW ? 78 GS_h DW ? 79 LDT_reg DW ? 80 LDT_h DW ? 81 TR.
9-26 Vol. 3 PRO CESSOR MANAGE MENT AND INITIALIZA TION 114 115 ; ------------------------- DATA S EGMENT---------------------- 116 117 ; Initially, this data segment st arts at linear 0, according 118 ; to the processor’s power-up stat e.
Vol. 3 9-27 PROCESSOR MANAGEMEN T AND INITIALIZATION 159 ; DS,ES address the bottom 64K of flat linear memory 160 ASSUME DS:STARTUP_DATA, E S:STARTUP_DATA 161 ; See Figure 9-4 162 ; load GDTR with tem.
9-28 Vol. 3 PRO CESSOR MANAGE MENT AND INITIALIZA TION 201 MOV ECX, CS_BASE 202 ADD ECX, OFFSET (GDT_E PROM) 203 MOV ESI, [ECX].table_l inear 204 MOV EDI,EAX 205 MOVZX ECX, [ECX].table_l im 206 MOV APP_GDT_ram[EBX].t able_lim,CX 207 INC ECX 208 MOV EDX,EAX 209 MOV APP_GDT_ram[EBX].
Vol. 3 9-29 PROCESSOR MANAGEMEN T AND INITIALIZATION 246 247 ; move the TSS 248 MOV EDI,EAX 249 MOV EBX,TSS_INDEX* SIZE(DESC) 250 MOV ECX,GDT_DESC_O FF ;build linear address for TSS 251 MOV GS,CX 252 MOV DH,GS:[EBX].ba s_24_31 253 MOV DL,GS:[EBX].ba s_16_23 254 ROL EDX,16 255 MOV DX,GS:[EBX].
9-30 Vol. 3 PRO CESSOR MANAGE MENT AND INITIALIZA TION 289 PUSH DWORD PTR [EDX].EI P_reg 290 MOV AX,[EDX].DS_reg 291 MOV BX,[EDX].ES_reg 292 MOV DS,AX ; DS and ES no longer linear memory 293 MOV ES,BX.
Vol. 3 9-31 PROCESSOR MANAGEMEN T AND INITIALIZATION Figur e 9-4. Cons tructing T empor ary GDT and S witching to Pr ot ected Mode (Lines 162-172 of List File) FFFF FFFFH Base=0, Limit=4G ST ART : [CS.
9-32 Vol. 3 PRO CESSOR MANAGE MENT AND INITIALIZA TION Figure 9-5. Moving the GDT , IDT , and TSS from R OM to RAM (Lines 196-261 of List File) FFFF FFFFH GDT RAM • Move the GDT , IDT , TSS • Fix .
Vol. 3 9-33 PROCESSOR MANAGEMEN T AND INITIALIZATION 9.10.3 MAIN.ASM Sourc e Code The file MAIN.ASM shown in Example 9-2 defines the data and stack segments for this application and can be substituted with the main module task written in a high- level language that is invoked by the IRET instruction executed by ST ARTUP .
9-34 Vol. 3 PRO CESSOR MANAGE MENT AND INITIALIZA TION CODE SEGMENT ER use32 PUBLIC main_start: nop nop nop CODE ENDS END main_start, ds:data, ss:stack 9.10.4 Supporting Files The batch file shown in Example 9-3 can be used to assemble the source code files ST ARTUP .
Vol. 3 9-35 PROCESSOR MANAGEMEN T AND INITIALIZATION TABLE GDT ( LOCATION = GDT_EPROM , ENTRY = ( 10: PROTECTED_MODE_TAS K , startup.startup_code , startup.startup_data , main_module.data , main_module.code , main_module.stack ) ), IDT ( LOCATION = IDT_EPROM ); MEMORY ( RESERVE = (0.
9-36 Vol. 3 PRO CESSOR MANAGE MENT AND INITIALIZA TION 9.11 MICROC ODE UPDATE F A CILITIES The Pentium 4, Intel X eon, and P6 family processors have the capability to correct errata by loading an Intel-supplied data bl ock into the processor . The data block is called a microcode update.
Vol. 3 9-37 PROCESSOR MANAGEMEN T AND INITIALIZATION 9.11.1 Micr ocode Update A microcode update consists of an Intel-supplied binary that contains a descriptive header and data. No executable code resides within the u pdate. Each microcode update is tailored for a specific list of processor signatures.
9-38 Vol. 3 PRO CESSOR MANAGE MENT AND INITIALIZA TION NO TE The optional extended signature ta ble is supported starting with processor family 0FH, model 03H. . T able 9-6. Microc ode Update Field Definitions Field Name Offse t (bytes) Lengt h (bytes) Description Header V ersion 0 4 V ersion number of the upda te header.
Vol. 3 9-39 PROCESSOR MANAGEMEN T AND INITIALIZATION Reserv ed 36 12 Reserv ed fields for futur e expansion Update Da ta 48 Data Siz e or 2000 Update da ta Extended Signatur e Count Data Size + 48 4 S.
9-40 Vol. 3 PRO CESSOR MANAGE MENT AND INITIALIZA TION T able 9-7 . Microc ode Update F ormat 31 24 16 8 0 Bytes Header V ersion 0 Update Revision 4 Month: 8 Day: 8 Ye a r : 1 6 8 Processor Signature .
Vol. 3 9-41 PROCESSOR MANAGEMEN T AND INITIALIZATION 9.11.2 Op tional Extended Signatur e T able The extended signature table is a structure that ma y be appended to the end of the encrypted data when the encrypted data only supports a single processor signature (optional case).
9-42 Vol. 3 PRO CESSOR MANAGE MENT AND INITIALIZA TION a processor signature embedded in the mi crocode update with the processor sign a - ture returned by CPUID will cause the BIOS to reject the update. Example 9-5 shows how to check for a valid processor sign ature match between the processor and microcode update.
Vol. 3 9-43 PROCESSOR MANAGEMEN T AND INITIALIZATION The three platform ID bits, when read as a binary coded deci mal (BCD) number , indi - cate the bit position in the microcode update header’s processor flags field associated with the installed processor .
9-44 Vol. 3 PRO CESSOR MANAGE MENT AND INITIALIZA TION } Else { // // Assume the Data Size has been used to calculate the // location of Update.ProcessorSignature [N] and a match // on Update.ProcessorSignature[N] has a lready succeeded // If (Update.
Vol. 3 9-45 PROCESSOR MANAGEMEN T AND INITIALIZATION If (ChkSum == 00000000H) Success Else Fail 9.11.6 Micr ocode Update Loader This section describes an update loader used to load an update into a P entium 4, Intel X eon, or P6 family processor . It also discu sses the requirements placed on the BIOS to ensure proper loading.
9-46 Vol. 3 PRO CESSOR MANAGE MENT AND INITIALIZA TION • ECX contains 79H (address of IA32_ BIOS_UPDT_TRIG). Other requirements are: • If the update is loaded while the processor is in real mode, then the update data may not cross a segment boundary .
Vol. 3 9-47 PROCESSOR MANAGEMEN T AND INITIALIZATION If processor core supports Intel Hyper- Threading T echnology , the guideline de scribed in Section 9.11.6.3 also applies. 9.11.6.5 Update Loader Enhanc ements The update loader presented in Section 9.
9-48 Vol. 3 PRO CESSOR MANAGE MENT AND INITIALIZA TION 9.11.7 .1 Determining the Signatur e An update that is successfully loaded into the processor provides a signature that matches the update revision of the curren tly functioning revision. This signature is available an y time after the actual update has been loaded.
Vol. 3 9-49 PROCESSOR MANAGEMEN T AND INITIALIZATION Example 9-10. Pseudo Code to Authenticate the Update Z ← Obtain Update Rev ision from the Update Header to be authent icated; X ← Obtain Curren.
9-50 Vol. 3 PRO CESSOR MANAGE MENT AND INITIALIZA TION There are no optional functions. BIOS mu st load the appropri ate update for each processor during system ini tialization. A Header V ersion of an update block containing the value 0FFFFFFFFH indicates that the update block is unused and available for storing a new update.
Vol. 3 9-51 PROCESSOR MANAGEMEN T AND INITIALIZATION These requirements are checked by the BI OS during the execution of the write update function of this interface. The BI OS sequen tially scans through all of the update blocks in NVRAM starting with index 0.
9-52 Vol. 3 PRO CESSOR MANAGE MENT AND INITIALIZA TION } } NO TES The platform Id bits in IA32_PLA TFORM_ID are encoded as a three- bit binary coded decimal field. The platform bits in the microcod e update header are indivi dually bit encoded. The algorithm must do a translation from one format to the other prior to doing a check.
Vol. 3 9-53 PROCESSOR MANAGEMEN T AND INITIALIZATION Example 9-12. INT 15 DO42 Calling Progr am Pseudo-code // // We must be in real mode // If the system is not in Real mode ex it // // Detect presen.
9-54 Vol. 3 PRO CESSOR MANAGE MENT AND INITIALIZA TION // Do we have enough update slots for all CPUs? // If there are more blocks required to sup port the unique processor steppings than update block.
Vol. 3 9-55 PROCESSOR MANAGEMEN T AND INITIALIZATION } // // Compare the Update read to that written // If (Update read != Update written) { Display Diagnostic exit } I ← I + (size of microcode update / 2048) } // // Enable Update Loading, and inform user // Issue the Update Control function wi th Task = Enable.
9-56 Vol. 3 PRO CESSOR MANAGE MENT AND INITIALIZA TION In general, each function returns with CF cleared and AH contains the returned status. The gener al return codes and other constant definitions are listed in Section 9.
Vol. 3 9-57 PROCESSOR MANAGEMEN T AND INITIALIZATION 9.11.8.6 F unction 01H—Write Micr ocode Update Data This function integrates a new microcode up date into the BIOS storage device. T able 9-14 lists the parameters and return codes for the function.
9-58 Vol. 3 PRO CESSOR MANAGE MENT AND INITIALIZA TION Descripti on The BIOS is responsible for selecting an ap propriate update block in the non-v olatile storage for storing the new update.
Vol. 3 9-59 PROCESSOR MANAGEMEN T AND INITIALIZATION Finally , before storing the proposed update in NVRAM, the BIOS must verify the authenticity of the update via the mechanism described in Section 9.
9-60 Vol. 3 PRO CESSOR MANAGE MENT AND INITIALIZA TION Figure 9-8. Micr ocode Upda te Write Operation Flow [1] 1 V alid U pdat e H eader V ers ion? Loader R ev is ion M atc h BI O S’s Loader ? D oes.
Vol. 3 9-61 PROCESSOR MANAGEMEN T AND INITIALIZATION Figur e 9-9. Micr ocode Update Wri te Oper ation Flo w [2] Ret ur n I NVALI D_REVI SI ON Yes 1 Update Revis ion Newer Than NVRAM Update? Update Pas.
9-62 Vol. 3 PRO CESSOR MANAGE MENT AND INITIALIZA TION 9.11.8.7 Function 0 2H—Microc ode Update Con tro l This function enables loading of binary up dates into the processor . T able 9-15 lists the parameters and return codes for the function. This control is provided on a global basi s for all updates and processors.
Vol. 3 9-63 PROCESSOR MANAGEMEN T AND INITIALIZATION The READ_F AILURE error code returned by this function has meaning only if the control function is implemented in the BIOS NVRAM. The state of this feature (enabled/disabled) can also be implem ented using CMOS RAM b its where READ failure errors cannot occur .
9-64 Vol. 3 PRO CESSOR MANAGE MENT AND INITIALIZA TION The read function enables the caller to read any microcode update data that already exists in a BIOS and make decisions about th e addition of new updates.
Vol. 3 9-65 PROCESSOR MANAGEMEN T AND INITIALIZATION T able 9-18. Return Code De finitions Retu rn Co de Va l u e Description SUCCESS 00H The function c ompleted success fully. NO T_IMPL EMEN TED 86H The f unction is no t implemented . ERASE_F AILURE 90H A failure because of the inability to erase the stor age device.
9-66 Vol. 3 PRO CESSOR MANAGE MENT AND INITIALIZA TION.
Vol. 3 10-1 CHAP TER 10 ADV ANCED PR OGRAMMABLE IN TERRUPT CON TROLLER (APIC) The Advanced Progr ammable Interrupt Contro ller (APIC), refe rred to in the following sections as the local APIC, was introduced into the IA-32 processors with the P entium processor (see Section 19.
10-2 Vol. 3 ADVANCED PR OGRAMMABLE INTERRUP T CONTR OLLER (APIC) interrupt pins (LINT0 and LINT1). The I/O devices may also be connected to an 8259-type interrupt controller that is in turn connected to the processor through one of the local interrupt pins.
Vol. 3 10-3 ADVANCED PR OGRAMMABLE IN TERRUPT CON TROLLER (APIC) IPIs can be sent to other processors in the system or to th e originating processor (self-interrupts).
10-4 Vol. 3 ADVANCED PR OGRAMMABLE INTERRUP T CONTR OLLER (APIC) also be delivered to the individual processors through the local interrupt pins; however , this mechanism is commonly not used in MP systems. Figure 10-2. Local APICs and I/O APIC Wh en Intel Xeon Proc essors Are Used in Multiple-Proc essor Syste ms Figure 10-3.
Vol. 3 10-5 ADVANCED PR OGRAMMABLE IN TERRUPT CON TROLLER (APIC) The IPI mechanism is typically used in MP systems to send fixed interrupts (inter - rupts for a specific vector number) and sp ecial-purpose interrupts to processors on the system bus.
10-6 Vol. 3 ADVANCED PR OGRAMMABLE INTERRUP T CONTR OLLER (APIC) forward extendability for future Intel platform innov ations. These extensions and modifications are noted in the following sections. 10.4 LOCAL APIC The following sections describe the architectu re of the local APIC and how to detect it, identify it, and determine its status.
Vol. 3 10-7 ADVANCED PR OGRAMMABLE IN TERRUPT CON TROLLER (APIC) Figure 10-4. L ocal APIC S tructure Current Count Register Initial Count Register Divide Configuration Register V ersion Register Error.
10-8 Vol. 3 ADVANCED PR OGRAMMABLE INTERRUP T CONTR OLLER (APIC) T able 10-1 shows how the APIC registers are mapped into the 4-KByte APIC register space. Registers are 32 bits, 64 bits, or 256 bits in width; all are aligned on 128-bit boundaries. All 32-bit registers should be accessed using 128-bit aligned 32-bit loads or stores.
Vol. 3 10-9 ADVANCED PR OGRAMMABLE IN TERRUPT CON TROLLER (APIC) FEE0 00F0H Spurious Interrup t Vector R egister Bits 0-8 Read/Write; bits 9-31 Read Only. FEE0 0100H In-Servi ce Register (ISR); bits 0:31 Rea d On ly . FEE0 0110H In-Servi ce Register (ISR); bits 32:63 Rea d O nly .
10-10 Vol. 3 ADVANCED PR OGRAMMABLE INTERRUP T CONTR OLLER (APIC) 10.4.2 Pr esence o f the Local APIC Beginning with the P6 family processors, th e presence or abs ence of an on-chip local APIC can be detected using the CPUID inst ruction.
Vol. 3 10-11 ADVANCED PR OGRAMMABLE IN TERRUPT CON TROLLER (APIC) 1. Using the APIC global enable/disable flag in the IA32_APIC_BASE MSR (MSR address 1BH; see Figure 10-5 ): — When IA32_APIC_BASE[11] is 0, the processor is functionally equivalent to an IA-32 processor without an on-chip APIC.
10-12 Vol. 3 ADVANCED PR OGRAMMABLE INTERRUP T CONTR OLLER (APIC) • APIC Global Enable flag, bit 11 ⎯ Enables or disables the local APIC (see Section 10.4.3, “Enabling or Disabling the Local APIC” ). This flag is av ailable in the Pentium 4, Intel X eon, and P6 family processors.
Vol. 3 10-13 ADVANCED PR OGRAMMABLE IN TERRUPT CON TROLLER (APIC) this, operating system software should avoid writing to the local APIC ID register . The value returned by bits 31-24 of the EBX regis.
10-14 Vol. 3 ADVANCED PR OGRAMMABLE INTERRUP T CONTR OLLER (APIC) x2APIC will introduce 32-bit ID; see Section 10.5 . 10.4.7 .1 Local APIC S tate A fter Pow er-Up or Rese t Following a power -up or RE.
Vol. 3 10-15 ADVANCED PR OGRAMMABLE IN TERRUPT CON TROLLER (APIC) • The mask bits for all the L VT entries are set. Attempts to reset these bits will be ignored. • (For P entium and P6 family processors) The local APIC continues to listen to all bus messages in order to keep its arbitration ID synchronized with the rest of the system.
10-16 Vol. 3 ADVANCED PR OGRAMMABLE INTERRUP T CONTR OLLER (APIC) 10.5 EX TENDED XAPIC (X2APIC) The x2APIC architecture extends the xAPIC arch itecture (described in Section 9.4) in a backward compatible manner and provid es forward extendabilit y for future Intel platform innovations.
Vol. 3 10-17 ADVANCED PR OGRAMMABLE IN TERRUPT CON TROLLER (APIC) Ta b l e 10-2 , “x2APIC operating mode configurations” describe the possible combina - tions of the enable bit (EN - bit 11) and th e extended mode bit (EXTD - bit 10) in the IA32_APIC_BASE MSR.
10-18 Vol. 3 ADVANCED PR OGRAMMABLE INTERRUP T CONTR OLLER (APIC) 32-bit register . Similarly executing the WRMSR instruction with the APIC register address in ECX, writes bits 0 to 31 of regist er EAX to bits 0 to 31 of the speci fied APIC register .
Vol. 3 10-19 ADVANCED PR OGRAMMABLE IN TERRUPT CON TROLLER (APIC) 0080H 008H Ta s k P r i o r i t y R e g i s t e r (TPR) Rea d/W ri te . Bits 7:0 are RW. Bits 3 1:8 are Reserv ed. 0090H 009H Res erve d 00A0H 00AH Processor Priority Register (PPR) Rea d on ly .
10-20 Vol. 3 ADVANCED PR OGRAMMABLE INTERRUP T CONTR OLLER (APIC) 01F0H 01FH TMR bits 22 4:255 Rea d On ly . 0200H 020H Inte rrup t Reque st Register (IRR); bits 0:31 Rea d On ly . 0210H 021H IRR bits32:63 Rea d On ly . 0220H 022H IRR bits 64:95 Re ad On ly .
Vol. 3 10-21 ADVANCED PR OGRAMMABLE IN TERRUPT CON TROLLER (APIC) 10.5.1.3 R eserved Bit Checking Section 10.5.1.2 and Ta b l e 10-3 specifies the reserved bit definitions for the APIC registers in x2APIC mode.
10-22 Vol. 3 ADVANCED PR OGRAMMABLE INTERRUP T CONTR OLLER (APIC) to enable BIOS and/or platform firmware to re-configure the x2APIC IDs in some clusters to provide for unique and non-ov erlapping system wide IDs before config - uring the disconnected components into a single system.
Vol. 3 10-23 ADVANCED PR OGRAMMABLE IN TERRUPT CON TROLLER (APIC) field, VM-exit MSR -load address filed, and VM-entry MSR -load address field in Intel® 64 and IA-32 Architectures Software Develope r’s Manual, Volume 3B ). The X2APIC MSRs cannot to be loaded and stored on VMX transitions.
10-24 Vol. 3 ADVANCED PR OGRAMMABLE INTERRUP T CONTR OLLER (APIC) The default value for SVR[bit 12] is clear , indicating that an EOI broadcast will be performed. The support for Directed EOI capability can be detected by means of bit 24 in the Local APIC V ersion Register .
Vol. 3 10-25 ADVANCED PR OGRAMMABLE IN TERRUPT CON TROLLER (APIC) • xAPIC mode: IA32_ APIC_BASE[EN]=1 and IA32_APIC_ BASE[EXTD]=0 • x2APIC mode: IA32_APIC_BAS E[EN]=1 and IA32_APIC_BASE[EXTD]=1 .
10-26 Vol. 3 ADVANCED PR OGRAMMABLE INTERRUP T CONTR OLLER (APIC) x2APIC A fter R ESET The valid tr ansitions from the xAPIC mode state are: • to the x2APIC mode by setting EXT to 1 (resulting EN=1, EXTD= 1).
Vol. 3 10-27 ADVANCED PR OGRAMMABLE IN TERRUPT CON TROLLER (APIC) x2APIC T ransitions Fr om x 2APIC Mode From the x2APIC mode, the only v alid x2AP IC transition using IA32_APIC_BASE is to the state where the x2APIC is disabled by setting EN to 0 and EXTD to 0.
10-28 Vol. 3 ADVANCED PR OGRAMMABLE INTERRUP T CONTR OLLER (APIC) Support for the x2APIC architecture can be implemented in the local APIC unit. All existing PCI/MSI capable devices and IOxA PIC unit should work with the x2APIC extensions defined in this document .
Vol. 3 10-29 ADVANCED PR OGRAMMABLE IN TERRUPT CON TROLLER (APIC) The extended topology enumeration leaf is intended to assist software with enumer - ating processor topology on systems that requires 32-bit x2APIC IDs to address indi - vidual logical processors.
10-30 Vol. 3 ADVANCED PR OGRAMMABLE INTERRUP T CONTR OLLER (APIC) 10.6 HANDLING LOCAL IN TERRUPTS The following sections describe facilities th at are provided in the local APIC for handling local interrupts.
Vol. 3 10-31 ADVANCED PR OGRAMMABLE IN TERRUPT CON TROLLER (APIC) Figure 10-12. Local V ector T able (L VT) 31 0 7 Ve c t or Tim er M od e 0: One-shot 1: Periodic 12 15 16 17 18 Delivery Mode 000: Fix.
10-32 Vol. 3 ADVANCED PR OGRAMMABLE INTERRUP T CONTR OLLER (APIC) The setup information th at can be specified in the registers of the L VT table is as follows: Vector Interrupt vector number . Delivery Mo de Specifies the type of interrupt to be sent to the processor .
Vol. 3 10-33 ADVANCED PR OGRAMMABLE IN TERRUPT CON TROLLER (APIC) Interrup t Input Pin Po larity Specifies the polarity of the corresponding interrupt pin: (0) active high or (1) active low .
10-34 Vol. 3 ADVANCED PR OGRAMMABLE INTERRUP T CONTR OLLER (APIC) 10.6.3 Err or Handling The local APIC provides an error status register (ESR) that it uses to record errors that it detects when handling interrupts (see Figure 10-13 ). An APIC error interrupt is generated when the local APIC sets one of the error bits in the ESR.
Vol. 3 10-35 ADVANCED PR OGRAMMABLE IN TERRUPT CON TROLLER (APIC) 10.6.3.1 x2APIC Diff erenc es in Error Handling RDMSR and WRMSR operations to reserved addresses in the x2APIC mode will r aise a GP fault. Additionally reserved bit vi olations cause GP faults as detailed in Section 10.
10-36 Vol. 3 ADVANCED PR OGRAMMABLE INTERRUP T CONTR OLLER (APIC) If the ICR is programmed with lowest priority delivery mode then the "Re-directible IPI" bit will be set in x2APIC modes (same as legacy xA PIC behavior) and the inter - rupt will not be processed.
Vol. 3 10-37 ADVANCED PR OGRAMMABLE IN TERRUPT CON TROLLER (APIC) The time base for the timer is derived from the processor’ s bus clock, divided by the value specified in the divide configur ation register . The timer can be configured through the timer L VT entry for one-shot or periodic operation.
10-38 Vol. 3 ADVANCED PR OGRAMMABLE INTERRUP T CONTR OLLER (APIC) 10.6.5 Local In terrupt Accep tanc e When a local interrupt is sent to the processor core, it is subject to the acceptance criteria specified in the interru pt acceptance flow chart in Figure 10-25 .
Vol. 3 10-39 ADVANCED PR OGRAMMABLE IN TERRUPT CON TROLLER (APIC) The ICR consists of the following fields. Vector The vector number of the interrupt being sent. Delivery Mode Specifies the type of IPI to be se n t . T h i s f i e l d i s a l s o k n o w a s t h e IPI message type field.
10-40 Vol. 3 ADVANCED PR OGRAMMABLE INTERRUP T CONTR OLLER (APIC) ability for a processor to send a lowest prior - ity IPI is model specific and should be avoid - ed by BIOS and operating system software. 010 (SMI) Delivers an SMI interrupt to the target pro - cessor or processors.
Vol. 3 10-41 ADVANCED PR OGRAMMABLE IN TERRUPT CON TROLLER (APIC) Destination Mode Selects either physical (0) or logical (1) destination mode (see Section 10.
10-42 Vol. 3 ADVANCED PR OGRAMMABLE INTERRUP T CONTR OLLER (APIC) destination field se t to FH for Pentium and P6 family processors and to FFH for P entium 4 and Intel Xeon processors. 11: (All Excluding Self) The IPI is sent to all processors in a system with the exception of the processor sending the IPI.
Vol. 3 10-43 ADVANCED PR OGRAMMABLE IN TERRUPT CON TROLLER (APIC) Self Inv alid X L owes t Priority, NMI, INIT , SMI, Start- Up X All Including Self Val id Edge Fixed X All Including Self In valid 2 L.
10-44 Vol. 3 ADVANCED PR OGRAMMABLE INTERRUP T CONTR OLLER (APIC) 10.7 .1.1 ICR Operation in x2APIC Mode In x2APIC mode, the layout of the In terrupt Command R egister is shown in Figure 10-17 .
Vol. 3 10-45 ADVANCED PR OGRAMMABLE IN TERRUPT CON TROLLER (APIC) ICR in xAPIC mode, except the Delivery Status bit is removed since it is not needed in x2APIC mode.
10-46 Vol. 3 ADVANCED PR OGRAMMABLE INTERRUP T CONTR OLLER (APIC) 10.7 .2 Determining IPI Des tination The destination of an IPI can be one, all, or a subset (group) of the processors on the system bus.
Vol. 3 10-47 ADVANCED PR OGRAMMABLE IN TERRUPT CON TROLLER (APIC) APICs to be addressed on the APIC bus. A br oadcast to all local APICs is specified with 0FH. NO TE The number of local APICs that can be addressed on the system bus may be restricted by hardw are.
10-48 Vol. 3 ADVANCED PR OGRAMMABLE INTERRUP T CONTR OLLER (APIC) The interpretation of MDA for the two mo dels is described in the following para - graphs.
Vol. 3 10-49 ADVANCED PR OGRAMMABLE IN TERRUPT CON TROLLER (APIC) lowest priority delivery mode is not supported i n cluster mode and must not be configured by software. The hierarchical cluster destination model can be used with Pentium 4, Intel X eon, P6 family , or Pentium processors.
10-50 Vol. 3 ADVANCED PR OGRAMMABLE INTERRUP T CONTR OLLER (APIC) mode is not supported in the x2APIC mode. Hence the Destination Format Register (DFR) is eliminated in x2APIC mode.
Vol. 3 10-51 ADVANCED PR OGRAMMABLE IN TERRUPT CON TROLLER (APIC) 10.7 .2.5 Broadcast/Self Deliv ery Mode The destination shorthand fiel d of the ICR allows the delivery mode to be by-passed in favor of broadcasting the IPI to all the processors on the system bus and/or back to itself (see Section 10.
10-52 Vol. 3 ADVANCED PR OGRAMMABLE INTERRUP T CONTR OLLER (APIC) Here, the TPR value is the task priority value in the TPR (see Figure 10-26 ), the IRRV value is the v ector number for the highest pr.
Vol. 3 10-53 ADVANCED PR OGRAMMABLE IN TERRUPT CON TROLLER (APIC) The SELF IPI register is a write-only regist er . A RDMSR instruction with address of the SELF IPI register will raise a GP fault.
10-54 Vol. 3 ADVANCED PR OGRAMMABLE INTERRUP T CONTR OLLER (APIC) priorities of the local APICs by resetting Ar b ID register of each agent to its current APIC ID value. (The P entium 4 and Intel Xe on processors do not implement the Arb ID register .
Vol. 3 10-55 ADVANCED PR OGRAMMABLE IN TERRUPT CON TROLLER (APIC) 3. If the local APIC determines that it is the designated destination for the interrupt but the interrupt request is not one of the interrupts given in step 2, the local APIC sets the appropriate bit in the IRR.
10-56 Vol. 3 ADVANCED PR OGRAMMABLE INTERRUP T CONTR OLLER (APIC) 1. (IPIs only) It examines the IPI message to determines if it is the specified destination for the IPI as described in Section 10.
Vol. 3 10-57 ADVANCED PR OGRAMMABLE IN TERRUPT CON TROLLER (APIC) interrupt, or one of the MP protocol IPI messages (BI PI, FIPI, and SIPI), the interrupt is sent directly to the processor core for h andling.
10-58 Vol. 3 ADVANCED PR OGRAMMABLE INTERRUP T CONTR OLLER (APIC) of vectors within a priority group, the vector number is often divided into two parts, with the high 4 bits of the vector indicating its priority and the low 4 bit indicating its ranking within the priority group.
Vol. 3 10-59 ADVANCED PR OGRAMMABLE IN TERRUPT CON TROLLER (APIC) Its value in the PPR is computed as follows: IF TPR[7:4] ≥ ISRV[7:4] THEN PPR[7:0] ← TPR[7:0] ELSE PPR[7:4] ← ISRV[7:4] PPR[3:0] ← 0 Here, the ISRV value is the vector number of the highest priority ISR bit that is set, or 00H if no ISR bit is set.
10-60 Vol. 3 ADVANCED PR OGRAMMABLE INTERRUP T CONTR OLLER (APIC) The IRR contains the active interrupt requests that have been accepted, but not yet dispatched to the processor for servicing. When the local APIC accepts an interrupt, it sets the bit in the IRR that correspon ds the vector of the accepted interrupt.
Vol. 3 10-61 ADVANCED PR OGRAMMABLE IN TERRUPT CON TROLLER (APIC) bit is cleared for edge-triggered interrupts and set for level-triggered interrupts. If a TMR bit is set when an EOI cycle for its corresponding interrupt vector is generated, an EOI message is sent to all I/O APICs.
10-62 Vol. 3 ADVANCED PR OGRAMMABLE INTERRUP T CONTR OLLER (APIC) • Loading the TPR with a value of 8 (01000B) blocks all interrupts with a priorit y of 8 or less while allowing all inte rrupts with a priority of nine or more to be recognized. • Loading the TPR with zero enables all external interrupts.
Vol. 3 10-63 ADVANCED PR OGRAMMABLE IN TERRUPT CON TROLLER (APIC) There are no ordering mechanisms between direct updates of the APIC.TPR and CR8. Operating softw are should implement either direct APIC TPR updates or CR8 style TPR updates but not mix them.
10-64 Vol. 3 ADVANCED PR OGRAMMABLE INTERRUP T CONTR OLLER (APIC) 10.11 APIC BUS MESSAGE PASSING MECHANISM AND PR O TOCOL (P6 F AMILY , PEN TIUM PR OCESSORS) The Pentium 4 and Intel X eon processors pass messages among the local and I/O APICs on the system bus, using the system bus message passing mechanism and protocol.
Vol. 3 10-65 ADVANCED PR OGRAMMABLE IN TERRUPT CON TROLLER (APIC) the bus regardless of its sender ’ s arbitration priority , unless more than one APIC issues an EOI message simultaneously . In the latter case, the APICs sending the EOI messages arbitrate using their arbitration priorities.
10-66 Vol. 3 ADVANCED PR OGRAMMABLE INTERRUP T CONTR OLLER (APIC) 10.12.1 Message Addr ess Register F ormat The format of the Message Address Register (lower 32-bits) is shown in Figure 10-32 . Fields in the Message Address Register are as follows: 1.
Vol. 3 10-67 ADVANCED PR OGRAMMABLE IN TERRUPT CON TROLLER (APIC) destination mode and only the processor in the system that has th e matching APIC ID is considered for delivery of that interrupt (this means no re-direction).
10-68 Vol. 3 ADVANCED PR OGRAMMABLE INTERRUP T CONTR OLLER (APIC) Reserved fields are not assumed to be any value. Software must preserve their contents on writes. Other fields in the Message Data R egister are described below. 1. Vector — This 8-bit field contains the interrupt vector associated with the message.
Vol. 3 11-1 CHAP TER 11 MEMORY CACHE CONT R OL This chapter describes the memory cache and cache control mechanisms, the TLBs, and the store buffer in Intel 64 and IA-32 processors. It also describes the memory type range registers (MTRRs) introduced in the P6 family processors and how they are used to control caching of physical memory locations.
11-2 Vol. 3 MEMORY CACHE CON TROL Figure 11-2 shows the cache arrangement of Intel Core i7 processor . Figure 11-2. Cache S tructure of the In tel Core i7 Pr ocessors T able 11-1.
Vol. 3 11-3 MEMORY CACH E CONTROL L1 Da ta Cache • Pentium 4 and Intel Xeon proce ssors (B ased on Intel Ne tBurst microa rchitectur e): 8-KByte, 4-way se t associativ e, 64-byte cache line size. • Pentium 4 and Intel Xeon proce sso rs (Based on Intel Ne tBurst microa rchitecture ): 16-KByte, 8-way se t associativ e, 64-byte cache line size.
11-4 Vol. 3 MEMORY CACHE CON TROL Instruction TLB (4-KByte Pages) • Pen tium 4 and Intel X eon proces sors (Based on Intel NetBurs t microar chitecture): 128 e ntries, 4-way set associa tive . • Intel A tom processors: 32-entries, fully associative.
Vol. 3 11-5 MEMORY CACH E CONTROL Intel 64 and IA-32 processors may implement four types of caches: the trace cache, the level 1 (L1) cache, the level 2 (L2) cache, and the lev el 3 (L3) cache.
11-6 Vol. 3 MEMORY CACHE CON TROL • Pentium 4 and Intel Xeon processors Based on Intel NetBurst microar - chitecture — The trace cache caches decoded instructions ( μ ops) from the instruction decoder and the L1 cache contains data. The L2 and L3 caches are unified data and instruction caches located on the processor chip.
Vol. 3 11-7 MEMORY CACH E CONTROL Processors based on Intel Core microarchitectures implement one level of instruction TLB and two levels of data TLB. Intel Co re i7 processor provides a second-level unified TLB. The store buffer is associated with the processors instruction execution units.
11-8 Vol. 3 MEMORY CACHE CON TROL (depending on the write policy currently in force) can also write it out to memory . If the operand is to be written out to memory , it is written first into the store buffer , and then written from the store buffer to memory when the system bus is available.
Vol. 3 11-9 MEMORY CACH E CONTROL registers to access UC memory that may hav e read or write side effects. • Uncacheable (UC-) — Has same characteristics as the strong uncacheable (UC) memory type, except that this memory type can be overridden by programming the MTRRs for the WC memory type.
11-10 Vol. 3 MEMORY CACHE CON TROL possible) and through to system memo ry . When writing thro ugh to memory , inv alid cache lines are never filled, and v a lid cache lines are either filled or inv al - idated.
Vol. 3 11-11 MEMORY CACH E CONTROL 11.3.1 Buff ering of Write Combining Memory L ocations W rites to the WC memory type are not cached in the typical sense of the word cached. They are retained in an internal write combining buffer (WC buffer) that is separate from the internal L1, L2, and L3 caches and the store buffer .
11-12 Vol. 3 MEMORY CACHE CON TROL The WC memory type is weakly ord ered by definition. Once the eviction of a WC buffer has started, the data is subject to the weak ordering semantics of its defini - tion.
Vol. 3 11-13 MEMORY CACH E CONTROL large data structure should be marked as un cacheable, or reading it will evict cached lines that the processor will be referencing again. A similar example would be a write-only data structure that is written to (to export the data to another agent), but never read by softw are.
11-14 Vol. 3 MEMORY CACHE CON TROL The L1 instruction cache in P6 family proce ssors implements only the “SI” part of the MESI protocol, because the instruction cache is not writable. The instruction cache monitors changes in the data cache to maintain consistency between the caches when instructions are modified.
Vol. 3 11-15 MEMORY CACH E CONTROL 11.5.1 Cache Cont rol R egisters and Bits Figure 11-3 depicts cache-control mechanisms in IA -32 processors. Other than for the matter of memory address space, these work the same in Intel 64 processors.
11-16 Vol. 3 MEMORY CACHE CON TROL Figure 11-3. Cache-Contr ol Registers and Bits A vailable in Intel 64 and IA-32 Proc essors Page-Directory or Page-T able Entry TLBs MTRRs 3 Physical Memory 0 FFFFFF.
Vol. 3 11-17 MEMORY CACH E CONTROL T able 11-5. Cache Oper ating Modes CD NW Caching and Read/Write Policy L1 L2/L3 1 0 0 Normal Cache Mode. Highe st perf ormance cache opera tion. • Read hi ts access the cache; re ad misses may ca use replacem ent.
11-18 Vol. 3 MEMORY CACHE CON TROL • NW flag, bit 29 of control register CR0 — Controls the write policy for system memory locatio ns (see Section 2.
Vol. 3 11-19 MEMORY CACH E CONTROL corrupt addresses. • PCD flag in the page-directo ry and page-table entries — Controls caching for individual page tables and pages, resp ectively (see Section 4.9, “Paging and Memory T yping” ). This flag only has effect wh en paging is enabled and the CD flag in control register CR0 is clear .
11-20 Vol. 3 MEMORY CACHE CON TROL page-table entries) permit caching in an external L2 cache to be controlled on a page-by-page basis, consistent with th e control exercised on the L1 cache of these processors. The P6 and more recent processor families do not provide these pins because the L2 cache in internal to the chip package.
Vol. 3 11-21 MEMORY CACH E CONTROL When normal caching is in effect, the e ffe ctive memory type shown in T able 11 -6 is determined using the following rules: 1. If the PCD and PWT attributes for the page are both 0, then the effective memory type is identical to the MTRR -defined memory type.
11-22 Vol. 3 MEMORY CACHE CON TROL 11.5.2.2 Selecting Memory T ypes for Pen tium III and More R ecen t Proc essor F amilies The Intel Core 2 Duo, Intel Atom, Intel Core Duo, Intel Core Solo , Pentium M, Pentium 4, Intel X eon, and Pentium III processors use the PA T to select effective page-level memory types.
Vol. 3 11-23 MEMORY CACH E CONTROL 11.5.2.3 Writing V alues Across P ages with Differen t Memory T ypes If two adjoining pages in memory have different memory types, and a word or longer operand is written to a memory location that crosses the page boundary between those two pages, the operand might be written to memory twice.
11-24 Vol. 3 MEMORY CACHE CON TROL 11.5.3 Pre ven ting Caching T o disable the L1, L2, and L3 caches after they have been enabled and have received cache fills, perform the following steps: 1. Enter the no-fill cache mode. (Set the CD flag in control register CR0 to 1 and the NW flag to 0.
Vol. 3 11-25 MEMORY CACH E CONTROL 11.5.4 Disabling and Enabling the L3 Cache On processors based on Intel NetBurst microarchitecture, the third-level cache can be disabled by bit 6 of the IA32_MISC_EN ABLE MSR.
11-26 Vol. 3 MEMORY CACHE CON TROL The CLFLUSH instruction allow selected cach e lines to be flushed from memory . This instruction give a program the ability to expl icitly free up cache space, when it is known that cached section of system memory will not be accessed in the near future .
Vol. 3 11-27 MEMORY CACH E CONTROL on the Intel NetBurst microarchitecture that support Intel Hyper- Threading T ech - nology . 11.6 SELF-MODIFYING CODE A write to a memory location in a code segment that is curr ently cached in the processor causes the associated cache line (or lines) to be inv alidated.
11-28 Vol. 3 MEMORY CACHE CON TROL T o a void problems related to implicit cach ing, the operating system must explicitly inv alidate the c ache when changes are made to cacheable da ta that the cache coher - ency mechanism does not automatically handle.
Vol. 3 11-29 MEMORY CACH E CONTROL 11.9 IN VALIDATING THE T R ANSLATION LOOK ASIDE BUFFERS (TLBS) The processor updates its address translat ion caches (TLBs) transparently to soft - ware. Sever al mechanisms are available, however , that allow software and hardware to invalidate the TLBs either explicitly or as a side effect of another operation.
11-30 Vol. 3 MEMORY CACHE CON TROL The discussion of write ordering in Sectio n 8.2, “Memory Ordering, ” gives a detailed description of the operation of the store buffer . 11.11 MEMORY T YPE RANGE REGIS TERS (MTRRS) The following section pertains only to th e P6 and more recent processor families.
Vol. 3 11-31 MEMORY CACH E CONTROL Res erve d* 03H Write-through (WT) 04H Write-pro tected (WP) 05H Writeback (WB) 06H Res erve d* 7H through FFH NO TE: * U se of these enc odings results in a gener al-pro tection ex ception (#GP). Figure 11-4. Mapping Ph ys ical Memory With MTRRs T able 11-8.
11-32 Vol. 3 MEMORY CACHE CON TROL 11.11.1 MTRR F eature Identification The availability of the MTRR feature is mo del-specific. Software can determine if MTRRs are supported on a processor by executing the CPUID instruction and reading the state of the MTRR flag (bit 12) in the feature information register (EDX).
Vol. 3 11-33 MEMORY CACH E CONTROL 11.11.2 Se tting Memory Ranges with MTRRs The memory ranges and the types of memory specified in each range are set by three groups of registers: the IA32_MTRR_DEF_TYPE MSR, the fixed-range MTRRs, and the variable r ange MTRRs.
11-34 Vol. 3 MEMORY CACHE CON TROL memory . When this flag is set, the FE flag can disable the fixed-range MTRRs; when the flag is clear , the FE flag has no affect. When the E flag is set, the type specified in the default memory type fi eld is used for areas of memory not already mapped by either a fixed or v ariable MTRR.
Vol. 3 11-35 MEMORY CACH E CONTROL Figure 11-7 shows flags and fields in these regi ste rs. The functions of these flags and fields are: • Type field, bits 0 through 7 — Specifies the memory type for the r ange (see Ta b l e 1 1 - 8 for the encoding of this field).
11-36 Vol. 3 MEMORY CACHE CON TROL — The width of the PhysMask field depe nds on the maximum physical address size supported by the processor . CPUID.
Vol. 3 11-37 MEMORY CACH E CONTROL NO TE It is possible for software to parse the memory descriptions that BIOS provides by using the ACPI /INT15 e820 interface mechanism.
11-38 Vol. 3 MEMORY CACHE CON TROL Before attempting to access these SMRR registers, software must test bit 11 in the IA32_MTRRCAP register . If SMRR is not suppor ted, reads from or writes to registers cause general-protection exceptions.
Vol. 3 11-39 MEMORY CACH E CONTROL 3FFFFFH (2 MBytes to 4 MBytes), a mask value of FFFE00000H is required. Again, the 12 least-significant bits of this mask v alue are truncated, so that the v alue entered in the PhysMask field of IA32_MTRR_PHYSMASK3 is FFFE00H.
11-40 Vol. 3 MEMORY CACHE CON TROL IA32_MTRR_PHYSBASE5 = 0000 0 000 A000 0001H IA32_MTRR_PHYSMASK5 = 0000 0 00F FF80 0800H Caches A000000 0-A0800000 as WC typ e.
Vol. 3 11-41 MEMORY CACH E CONTROL 11.11.4 Range Siz e and Alignment R equiremen t A range that is to be mapped to a v ari able-r ange MTRR must meet the following “power of 2” size and alignment rules: 1. The minimum range size is 4 KBytes and the base address of the range must be on at least a 4-KByte boundary .
11-42 Vol. 3 MEMORY CACHE CON TROL the MTRRs according to known types of me mory , including memory on devices that it auto-configures. Initialization is expected to occur prior to booting the oper ating system. See Section 11.11.8, “MTRR Considerations in MP Systems, ” for information on initializing MTRRs in MP (multiple-processor) syste ms.
Vol. 3 11-43 MEMORY CACH E CONTROL automatically aligns the base address and size to 4-KByte boundaries. Pseudocode for the MemT ypeGet() function is given in Example 11-4 . Example 11-4. MemTypeGe t() Pseudocode #define MIXED_TYPES -1 /* 0 < MIXED_TYPES || MIXED_TYPES > 256 */ IF CPU_FEATUR ES.
11-44 Vol. 3 MEMORY CACHE CON TROL Example 11-5. Get4KMemT ype() Pseudocode IF IA32_MTRRCAP.FIX AND MTRRdefType.FE / * fixed registers enabled */ THEN IF PHY_ADDRESS is within a fixed range return IA32_MTRR_FIX.Type; FI; FOR each variable- range MTRR in IA32_MTRRCAP .
Vol. 3 11-45 MEMORY CACH E CONTROL THEN pre_mtrr_change(); update affected MTRR; post_mtrr_change(); FI; ELSE (* try to map using a variable MTRR pair *) IF IA32_MTRRCAP.
11-46 Vol. 3 MEMORY CACHE CON TROL END The physical address to v ariable range mapping algorithm in the MemT ypeSet func - tion detects conflicts with current variable range registers by cycling through them and determining whether the physical address in question matches any of the current ranges.
Vol. 3 11-47 MEMORY CACH E CONTROL 4. Enter the no-fill cache mode. (Set the CD fl ag in control register CR0 to 1 and the NW flag to 0.) 5. Flush all caches using the WBINVD instructions. Note on a processor that supports self-snoopin g, CPUID feature flag bit 2 7, this step is unnecessary .
11-48 Vol. 3 MEMORY CACHE CON TROL The requirement that all 4-KByte ranges in a larg e page are of the same memory type implies that large pages with different memory types may suffer a performance penalty , since they mu st be marked with the lowest common denominator memory type.
Vol. 3 11-49 MEMORY CACH E CONTROL 11.12.2 IA32_P A T MSR The IA32_PA T MSR is located at MSR address 277H (see to App endix B, “Model- Specific Registers (MSRs), ” and this address will remain at the same address on future IA -32 processors that support the P A T feature.
11-50 Vol. 3 MEMORY CACHE CON TROL 11.12.3 Selecting a Memory T ype fr om the P A T T o select a memory type for a page from the PA T , a 3-bit index made up of the PA T , PCD, and PWT bits must be encoded in the page-table or page-directory entry for the page.
Vol. 3 11-51 MEMORY CACH E CONTROL The values in all the entries of the PA T can be changed by writing to the IA3 2_PA T MSR using the WRMSR instruction. The IA32_ PA T MSR is read and write accessible (use of the RDMSR and WRMSR instructions, respectively) to software operating at a CPL of 0.
11-52 Vol. 3 MEMORY CACHE CON TROL 11.12.5 P A T Compatibility with Earlier IA-32 Pr ocessors For IA -32 processors that support the PA T , the IA32_PA T MSR is always active.
Vol. 3 12-1 CHAP TER 12 IN TEL ® MMX ™ T ECHNOLOGY S YSTEM PR OGR AMMING This chapter describes those features of the Intel ® MMX™ technology that must be considered when designing or enhancing an operating system to support MMX tech - nology .
12-2 Vol. 3 INTEL ® MMX ™ T ECH NOLOGY SYSTEM PROGR AMMING result, the MMX register mapping is fixed an d is not affected by value in the T op Of Stack (TOS) field in the floating-point status word (bits 11 through 13).
Vol. 3 12-3 INTEL ® MMX ™ T EC HNOL OGY SY STEM P ROGRAMM ING • When the EMMS instruction is executed, ea ch tag field in the x87 FPU tag word is set to 11B (empty).
12-4 Vol. 3 INTEL ® MMX ™ T ECH NOLOGY SYSTEM PROGR AMMING 12.3 SAVING AND REST ORING THE MMX S TATE AND R EGISTERS Because the MMX registers are aliased to the x87 FPU data registers, the MMX state can be saved to memory and restored from memory as follows: • Execute an FSA VE, FNSA VE, or FXSA VE instruction to save the MMX state to memory .
Vol. 3 12-5 INTEL ® MMX ™ T EC HNOL OGY SY STEM P ROGRAMM ING • Execute eight MOVQ instructions to sav e the contents of the MMX0 through MMX7 registers to memory . An EMMS instruction may then (optionally) be executed to clear the MMX state in the x87 FPU.
12-6 Vol. 3 INTEL ® MMX ™ T ECH NOLOGY SYSTEM PROGR AMMING • System exceptions: — Invalid Opcode (#UD), if the EM flag in control register CR0 is set when an MMX instruction is executed (see Section 12.1, “Emulation of the MMX Instruction Set” ).
Vol. 3 12-7 INTEL ® MMX ™ T EC HNOL OGY SY STEM P ROGRAMM ING When the TOS equals 2 (case B in Figure 12-2), ST0 points to the ph ysical location R2.
12-8 Vol. 3 INTEL ® MMX ™ T ECH NOLOGY SYSTEM PROGR AMMING.
Vol. 3 13-1 CHAP TER 13 SYS TEM PR OGR AMMING F OR INSTRUCTION SET EX TENSIONS AND PR OCESSOR EX TENDED S TATES This chapter describes system programming features for instruction set extensions operating on the processor state extension known as the SSE state (XMM registers, MXCSR) and for proce ssor extended states.
13-2 Vol. 3 SYSTE M PROGRAM MING FOR IN STRUC TIO N SET EXTENSIONS AND PR OCESSOR guidelines for this support. Because SS E/SSE2/S SE3/SSSE3/SSE4 extensions share the same state, experience the same s.
Vol. 3 13-3 SYSTE M PROGRAM MING FOR INST RUCTION SET EXTENSIONS AND T o use POPCNT instruction, software must check CPUID .1:ECX.POPCNT[bit 23] = 1 13.1.3 Checking f or Support for the FXS A V E and FXRST OR Instructions A separate check must be made to insure that the processor supports FXSA VE and FXRSTOR.
13-4 Vol. 3 SYSTE M PROGRAM MING FOR IN STRUC TIO N SET EXTENSIONS AND PR OCESSOR • OSFXSR and OSXMMEXCPT flag s in control register CR4 • SSE/S SE2/SSE3/SSSE3/S SE4 feature flags returned by CPUID • EM, MP , and TS flags in control register CR0 T able 13-1.
Vol. 3 13-5 SYSTE M PROGRAM MING FOR INST RUCTION SET EXTENSIONS AND The SIMD floating-point exception mask bits (bits 7 through 12), the flush-to-z ero flag (bit 15), the denormals-are- zero flag (bit 6 ), and the rounding control field (bits 13 and 14) in the MXCSR register should be left in their default values of 0.
13-6 Vol. 3 SYSTE M PROGRAM MING FOR IN STRUC TIO N SET EXTENSIONS AND PR OCESSOR to a 16-byte boundary will also generate a general-protection exception, instead a stack -segment fault exception (#SS). — Page fault (#PF). — Alignment check (#AC).
Vol. 3 13-7 SYSTE M PROGRAM MING FOR INST RUCTION SET EXTENSIONS AND — Device not available (#NM). This exception is generated by executing a SSE/SS E2/SSE 3/SSSE3/SS E4 ins truc tion w hen th e TS fla g (bit 3 ) of CR0 is set to 1. Other exceptions can occur indirectly due to faulty ex ecution of the abov e exceptions.
13-8 Vol. 3 SYSTE M PROGRAM MING FOR IN STRUC TIO N SET EXTENSIONS AND PR OCESSOR 13.1.6.1 Numeric Error fla g and IGNNE# SSE/SS E2/SSE3/ SSE4 ex tens ions i gnor e the NE flag in control register CR0 (that is, treats it as if it were always set) and the IGNNE# pin.
Vol. 3 13-9 SYSTE M PROGRAM MING FOR INST RUCTION SET EXTENSIONS AND • Execute a LDMXCSR instruction to restore the state of the MXCSR register from memory .
13-10 Vol. 3 SYSTE M PROGRAM MING FOR IN STRUC TIO N SET EXTENSIONS AND PR OCESSOR when a suspended task is resumed (usi ng an FXRST OR instruction). Here, the x87 FPU/MMX/SSE/SSE2/S SE3/SSE4 state must be saved as part of the task state.
Vol. 3 13-11 SYSTE M PROGRAM MING FOR INST RUCTION SET EXTENSIONS AND The TS flag can be set either explicitly (by executing a MOV instruction to control register CR0) or implicitly (using the IA-32 architecture’ s native task switching mech - anism).
13-12 Vol. 3 SYSTE M PROGRAM MING FOR IN STRUC TIO N SET EXTENSIONS AND PR OCESSOR If a new task attempts to access an x87 FP U, MMX, XMM, or MXCSR register while the TS flag is set to 1, a device-not -a vailabl e exception (#NM) is generated. The device- not-a vailable exception handler executes the following pseudo-code.
Vol. 3 13-13 SYSTE M PROGRAM MING FOR INST RUCTION SET EXTENSIONS AND — CPUID leaf function 0DH enumerates the list of processor states (including legacy x87 FPU, SSE states and processor extended states), the offset and size of individual save area for each processor extended state.
13-14 Vol. 3 SYSTE M PROGRAM MING FOR IN STRUC TIO N SET EXTENSIONS AND PR OCESSOR The XSA VE header is 64 bytes in length an d must be aligned on 64 byte boun dary . Therefore, the XSAVE/XRST OR region must be aligned on 64-byte boundary . The format of the header is as follows (see T able 13-3 ): The value of each bit in HEADER.
Vol. 3 13-15 SYSTE M PROGRAM MING FOR INST RUCTION SET EXTENSIONS AND enabled), a value of "1" in the corresponding bi t of HEADER.XST A TE_BV causes the processor state to be updated with contents of the save area read from the memory image.
13-16 Vol. 3 SYSTE M PROGRAM MING FOR IN STRUC TIO N SET EXTENSIONS AND PR OCESSOR XSA VE, XRSTOR instructions oper ating on FP or SSE state will cause a #NM Device Not Av ailable) exception, if CR0.
Vol. 3 13-17 SYSTE M PROGRAM MING FOR INST RUCTION SET EXTENSIONS AND 13.8 DE TECTION, ENUMER ATION, ENABLING PROC ESSOR EXTENDED S TATE SUPPORT An OS can determine if the XSA VE/XRST OR/XGETBV/XSETBV in structions and the XFEA TURE_ENABLED_MASK register (XCR0) are av ailable in the pr ocessor by checking the value of CPUID.
13-18 Vol. 3 SYSTE M PROGRAM MING FOR IN STRUC TIO N SET EXTENSIONS AND PR OCESSOR instructions, and provides a more constrained list of features than using all 1's in the save mask.
Vol. 3 13-19 SYSTE M PROGRAM MING FOR INST RUCTION SET EXTENSIONS AND If all three requirements are met, applicat ions can use the target new instruction set extensions.
13-20 Vol. 3 SYSTE M PROGRAM MING FOR IN STRUC TIO N SET EXTENSIONS AND PR OCESSOR.
Vol. 3 14-1 CHAP TER 14 PO W ER AND THERMAL MANAGEMENT This chapter describes facilities of Intel 64 and IA-32 architecture used for power management and thermal monitoring.
14-2 Vol. 3 POW ER AND THE RMAL MANAGEME NT tools can access model-specific events and report the occurrences of state transitions. 14.2 P-STATE HAR DWARE COOR DINATION The Advanced Configur ation and.
Vol. 3 14-3 PO WER AN D THERMAL MANA GEMENT • IA32_APERF MSR (0xE8) increments in prop ortion to actual performance, while accounting for hardware coordination of P-state and TM1/TM2; or software initiated throttling. • The MSRs are per logical processor; th ey measure performance only when the targeted processor is in the C0 state.
14-4 Vol. 3 POW ER AND THE RMAL MANAGEME NT // This example does not cover the additiona l logic or algorithms // necessary to coordinate multiple logical processo rs to a target P-state.
Vol. 3 14-5 PO WER AN D THERMAL MANA GEMENT corresponding enable mechanism is acti v ated, the headroom is available and certain criteria are met. • The opportunistic processor performance operation is generally tr ansparent to most application software.
14-6 Vol. 3 POW ER AND THE RMAL MANAGEME NT to the OS, it may be undesirable to allow the possibility of the processor delivering increased performance that cannot be sustained after the calibration phase.
Vol. 3 14-7 PO WER AN D THERMAL MANA GEMENT 14.3.2.4 Application A wareness o f Opportunistic Pr ocessor Operation (Op tional) There may be situations that an end user or application software wishes to be aware of turbo mode activity .
14-8 Vol. 3 POW ER AND THE RMAL MANAGEME NT • When the OS timer service transfers co ntrol, the application can use RDPMC (with ECX = 4000_0001H) to read IA32_P ERF_FIXED_CTR1 (MSR address 30AH) to .
Vol. 3 14-9 PO WER AN D THERMAL MANA GEMENT Software can progr am the lowest four bi ts of IA32_ENERGY_PERF_BIAS MSR with a value from 0 - 15. The values represent a sliding scale, where a value of 0 (the default reset value) corresponds to a hint preference for highest performance and a va lue of 15 corresponds to th e maximum energy savi ngs.
14-10 Vol. 3 POW ER AND THE RMAL MANAGEME NT Ref e re nc e , A- M, ” of Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 2A ). If CPUID.05H.ECX[Bit 1] = 1, the target processor supports using interrupts as break -events for MW AIT , even when interrupts are disabled.
Vol. 3 14-11 PO WER AN D THERMAL MANA GEMENT consumption; this is in addition to th e reduction offered by automatic thermal monitoring mechanisms. 4. On-die digital thermal sensor and interrupt mechanisms permit the OS to manage thermal conditions natively without relying on BIOS or other system board components.
14-12 Vol. 3 POW ER AND THE RMAL MANAGEME NT 14.5.1 Catastr ophic Shutdown Detector P6 family processors introduced a thermal sensor that acts as a catastrophic shut - down detector . This catastrophic shutdown de tector was also implemented in Pentium 4, Intel X eon and Pentium M processors.
Vol. 3 14-13 PO WER AN D THERMAL MANA GEMENT Support for TM2 is indicated by CPUID .1:ECX.TM2[bit 8] = 1. 14.5.2.3 Tw o Me thods for Enabling TM2 On processors with CPUID family/model/s tepping signat.
14-14 Vol. 3 POW ER AND THE RMAL MANAGEME NT 14.5.2.4 Performanc e S tate T r ansitions and Thermal Monit oring If the thermal control circuitry (TCC) for thermal monitor (TM1/TM2) is activ e, writes .
Vol. 3 14-15 PO WER AN D THERMAL MANA GEMENT After the second temperature sensor ha s been tripped, the thermal moni tor (TM1/TM2) will remain engaged for a minimu m time period (on the order of 1 ms).
14-16 Vol. 3 POW ER AND THE RMAL MANAGEME NT interrupt enable flags in the IA32_THERM_INTERRUPT MSR are cleared (interrupts are disabled) and the thermal L VT entry is set to mask interrupts. This interrupt should be handled either by the operat ing system or system management mode (SMM) code.
Vol. 3 14-17 PO WER AN D THERMAL MANA GEMENT The IA32_CLOCK_MODULA TION MSR contains the following flag and field used to enable software-controlled clock modulation and to select the clock modulation.
14-18 Vol. 3 POW ER AND THE RMAL MANAGEME NT clock modulation at the duty cycle specified by TM1 takes precedence, regardless of the setting of the on-demand clock modu lation duty cycle. For Hyper - Threading T echnology enabled processors, the IA32_CLOCK_MODULA TION register is duplicat ed for each logical processor .
Vol. 3 14-19 PO WER AN D THERMAL MANA GEMENT 14.5.5.2 R eading the Digital Sensor Unlike traditional analog thermal devices, th e output of the digital thermal sen sor is a temperature relative to the maximum supported oper ating temperature of the processor .
14-20 Vol. 3 POW ER AND THE RMAL MANAGEME NT • P R O CH O T # o r F O R C E P R # L o g ( b i t 3 , R / WC 0) — Sticky bit that indicates whether PROCHOT# or FORCEPR# has been asserted by another agent on the platform since the last clearing of this bit or a reset.
Vol. 3 14-21 PO WER AN D THERMAL MANA GEMENT • Reading Valid (bit 31, RO) — Indicates if the digital readout in bits 22:16 is valid. The readout is valid if bit 31 = 1. Changes to temperature can be detected using two thresholds (see Figure 14-12); one is set above and the other below the cu rrent temper ature.
14-22 Vol. 3 POW ER AND THE RMAL MANAGEME NT • Critical Temperature Interr upt Enable (bit 4, R /W) — Enables the generation of an interrupt when the Critical T emperature Detector has detected a critical thermal condition. The recommended response to this condition is a system shutdown.
Vol. 3 15-1 CHAP TER 15 MACHINE-CHECK ARCHITECTUR E This chapter describes the machine-ch eck architecture and machine-check exception mechanism found in the Pentium 4, Int el X eon, and P6 family processors. See Chapter 6, “Interrupt 18— M achine-Check Exception (#MC), ” for more information on machine- check exceptions.
15-2 Vol. 3 MACHINE-CHECK ARCHITECTUR E 15.2 COMPATIBILITY WITH PENTIUM PR OCESSOR The Pent ium 4, Intel Xeon, and P6 fam ily processors suppo rt and extend the machine-check except ion mechanism intr od uced in the Pentiu m processor .
Vol. 3 15-3 MACHINE-CHECK AR CHITECTURE Each error-reporting bank is associated with a specific hardw are unit (or group of hardware units) in the proc essor .
15-4 Vol. 3 MACHINE-CHECK ARCHITECTUR E Where: • Count field, bits 7:0 — Indicates the number of hardware unit error-reporting banks available in a particular processor implementation.
Vol. 3 15-5 MACHINE-CHECK AR CHITECTURE Section 15.6 ), and IA32_MCi_ST A TUS MSR bits 56:55 are used to report the signaling of uncorrected recover able errors and whether software must take recovery actions for uncorrected errors. Note that when MCG_TES_P is not set, bits 56:53 of the IA32_MCi_ST A TUS MSR ar e model-specific.
15-6 Vol. 3 MACHINE-CHECK ARCHITECTUR E 15.3.1.3 IA32_MCG_CTL MSR The IA32_MCG_CTL MSR is present if the capabi lity flag MCG_CTL_P is set in the IA32_MCG_CAP MSR.
Vol. 3 15-7 MACHINE-CHECK AR CHITECTURE encoding of 06H_1AH and on ward ): the operating system or executive softw are must not modify the contents of the IA32_MC0_CTL MSR. This MSR is internally aliased to the EBL_CR_POWERON MSR and controls platform-specific error handling features.
15-8 Vol. 3 MACHINE-CHECK ARCHITECTUR E introduced with Intel 64 processor having CPUID DisplayF amily_DisplayModel encoding of 06H_1AH. Where: • MCA (machine-check architecture) error code field, bits 15:0 — Specifies the machine-check architecture-defined error code for the machine-check error condition detected.
Vol. 3 15-9 MACHINE-CHECK AR CHITECTURE • If IA32_MCG_CAP[10] is 0, bits 52:38 also contain “Other Information” (in the same sense as bits 37:3 2).
15-10 Vol. 3 MACHINE-CHECK ARCHITECTUR E flag indicates that the error did not affect the proce ssor’s state. Softw are restarting might be possible. • ADDRV (IA32_MC i _ADDR register valid) flag, bit 58 — Indicates (when set) that the IA32_MCi_ ADDR register contains the add ress where the error occurred (see Section 15.
Vol. 3 15-11 MACHINE-CHECK AR CHITECTURE In T able 15-2, the v alues in the two left- most columns are IA32_MCi_ST A TUS[54:53]. If a second event overwrites a previous ly posted event, the information (as guarded by individual valid bits) in the MCi bank is entirely from the second event.
15-12 Vol. 3 MACHINE-CHECK ARCHITECTUR E 15.3.2.4 IA32_MC i _MISC MSRs The IA32_MC i _MISC MSR contains addi tional information describi ng the machine-check error if the MISCV flag in the IA32_MC i _ST A TUS register is set.
Vol. 3 15-13 MACHINE-CHECK AR CHITECTURE • Recov erable Address LSB (bits 5:0): The lowest valid recov erable address bit. Indicates the position of the least significant bit (LSB) of the recoverable error address. For example, if the processor logs bits [43:9 ] of the address, the LSB sub-field in IA32_MCi_MISC is 01001b (9 decimal).
15-14 Vol. 3 MACHINE-CHECK ARCHITECTUR E When IA32_MCG_CAP[10] = 1, the IA32_MCi_CTL2 MSR for each bank exists, i.e. reads and writes to thes e MSR are supported. However , signaling interface for corrected MC errors ma y not be sup ported in all banks.
Vol. 3 15-15 MACHINE-CHECK AR CHITECTURE 15.3.2.6 IA32_MC G Extended Machine Check S tate MSRs The Pentium 4 and Intel X eon processo rs implement a variable number of extended machine-check state MSRs.
15-16 Vol. 3 MACHINE-CHECK ARCHITECTUR E T able 15-5. Extended Machine Check S tate MSRs In Proc essors W ith Suppor t F or Intel 64 Architectur e MSR Address Description IA32_MC G_RAX 180H Cont ains sta te of the RAX r egiste r at the time of the machine- ch e ck er ro r.
Vol. 3 15-17 MACHINE-CHECK AR CHITECTURE When a machine-check error is detect ed on a P entium 4 or Intel X eon processor , the processor saves the stat e of the general-purpose registers, the R/EFLAGS register , and the R/EI P in these extended machine-check state MSRs.
15-18 Vol. 3 MACHINE-CHECK ARCHITECTUR E processor; the handler must be writte n to interpret P5_MC_TYPE encodings correctly . 15.4 ENHANC ED CACHE ERROR R EPORTING Starting with Intel Core Duo proc essors, cache error report ing was enhanced.
Vol. 3 15-19 MACHINE-CHECK AR CHITECTURE beyond those of threshold-based error reporting ( Section 15 .4 ). With threshold-based error reporting, softw are is limited to use periodic polling to query the status of hardware corrected MC errors.
15-20 Vol. 3 MACHINE-CHECK ARCHITECTUR E CMCI interrupt delivery is configured by writing to the L VT CMCI register entry in the local APIC register spac e at defaul t address of APIC_BASE + 2F0H. A CMCI interrupt can be deliv ered to more than one logical process ors if multiple logical processors are affe cted by the associated MC errors.
Vol. 3 15-21 MACHINE-CHECK AR CHITECTURE • Delivery status, bits 12 — It is a read-only bit that, when set, indicates that an interrupt from this source has been delivered to the processor core, but has not yet been accepted. • Mask, bits 16 — When set, inhibits reception of the interru pt.
15-22 Vol. 3 MACHINE-CHECK ARCHITECTUR E b. Each thread examines IA 32_MCi_CTL2[30] indicator for each bank to determine if another thread has already claimed ownership of that bank. • If IA32_MCi_CTL2[30] ha d been set by another thread. This thread can not own bank i and should proceed to step b.
Vol. 3 15-23 MACHINE-CHECK AR CHITECTURE • W rite 7FFFH to IA32_MCi_CTL2[15:0], • Re ad b a c k IA32_MCi_CTL2[15:0], the lower 15 bits (14:0) is the maximum threshold supported by the processor . b. Increase the threshold to a value below the maximum value discov ered using step a .
15-24 Vol. 3 MACHINE-CHECK ARCHITECTUR E 15.6.1 De tection o f Softw are Err or R eco very Support Software must use b it 24 of IA32_MCG_CAP (MCG_SER_P) to detect the presence of software error recov ery support (see Figure 15-2 ). When IA32_MCG_CAP[24] is set, this indicates that t he processor supports soft - ware error recovery .
Vol. 3 15-25 MACHINE-CHECK AR CHITECTURE • S (Signaling) flag, bit 56 - Indicates (when set) that a machine check exception was generated for the UCR error reported in this MC bank and system softwa.
15-26 Vol. 3 MACHINE-CHECK ARCHITECTUR E IA32_MCi_ST A TUS register . R ecovery action s for SRAO errors are MCA error code specific. The MISCV and the ADD RV flags in the IA32_MCi_ST A TUS register are set when the additional error information is available from the IA32_MCi_MISC and the IA32_MCi_ADDR re gisters.
Vol. 3 15-27 MACHINE-CHECK AR CHITECTURE 15.6.4 UCR Err or Overwrite Rules In general, the o verwrite rules are as follows: • UCR errors will overwrite corrected errors. • Uncorrected (PCC=1) errors ov er write UCR (PCC=0) errors. • UCR errors are not written over previous UCR errors.
15-28 Vol. 3 MACHINE-CHECK ARCHITECTUR E 15.7 MACHINE-CHECK A V AILABILITY The machine-check architecture and machine-check exception ( #MC) are model- specific features. Software ca n execute the CPUID instruction to determine whether a processor implem ents these features.
Vol. 3 15-29 MACHINE-CHECK AR CHITECTURE (* enables all MCA features *) FI (* Determine number of error-reporting banks supported *) COUNT ← IA32_MCG_CAP.
15-30 Vol. 3 MACHINE-CHECK ARCHITECTUR E also write a 16-bit model-specific error code in the IA32_MC i _ST A TUS register depending on the implementa tion of the machine-check architec - ture of the processor . The MCA error codes are architectura lly defi ned for Intel 64 and IA -32 processors.
Vol. 3 15-31 MACHINE-CHECK AR CHITECTURE 15.9.2 Compound Err or Codes Compound error codes describe errors related to the TLBs, memory , caches, bus and interconnect logic, and internal timer .
15-32 Vol. 3 MACHINE-CHECK ARCHITECTUR E The behavior of error filtering after cr ossing the yell ow threshold is model- specific. 15.9.2.2 Tr ansaction T ype (TT) Sub-Field The 2-bit T T sub-field ( T able 15-10) indicates the type of t ransaction (dat a, instruction, or generic).
Vol. 3 15-33 MACHINE-CHECK AR CHITECTURE caused the error . Eviction and snoop requ ests appl y only to the caches. All of the other requests apply to TLBs, caches and interconnects.
15-34 Vol. 3 MACHINE-CHECK ARCHITECTUR E 15.9.2.6 Memory Contr oller Erro rs The memory controller errors are defined wi th the 3-bit MMM (memory trans action type), and 4-bit CCCC (channel) sub-fields. The encodings for MMM and CCCC are defined in T able 15-14 .
Vol. 3 15-35 MACHINE-CHECK AR CHITECTURE 15-9 ). Their values and compound encoding format are given in Ta b l e 15-15 . T able 15-16 lists va lues of relevant bit fields of IA32_MCi_ST A TUS for archi - tecturally defined SRAO errors.
15-36 Vol. 3 MACHINE-CHECK ARCHITECTUR E IA32_MCG_ST A TUS register for the memory scrubbing and L3 expli cit write - back errors on both the reporting and non-reporting logical processors. 15.9.3.2 Architecturally De fined SRAR Err ors The following two SRAR errors are architecturally defined.
Vol. 3 15-37 MACHINE-CHECK AR CHITECTURE T able 15-19 lists va lues of relevant bit fields of IA32_MCi_ST A TUS for archi - tecturally defined SRAR errors.
15-38 Vol. 3 MACHINE-CHECK ARCHITECTUR E For Inst ruction Fetch rec overabl e error , the affected logical processor should find that the RIPV fl ag and the EIPV Flag in the IA32_MCG_ST A TUS register.
Vol. 3 15-39 MACHINE-CHECK AR CHITECTURE • When multiple recoverable errors are reported and no other fatal condition (e.g.. overflowed condition for SRAR error) is found for the reported r ecoverab.
15-40 Vol. 3 MACHINE-CHECK ARCHITECTUR E Guidelines for writing a machine-check ex ception handler or a machine- error logging utility are given in the following sec tions. 15.10.1 Machine-Check Ex cep tion Handler The machine-check exception (#MC) corr esponds t o vector 18.
Vol. 3 15-41 MACHINE-CHECK AR CHITECTURE generated). If this flag is clear , the processor may still be able to be restarted (for debugging purposes) but not without loss of program continuity .
15-42 Vol. 3 MACHINE-CHECK ARCHITECTUR E When machine-check exceptions are enabled for the P e ntium pr ocessor ( M C E f l a g i s s e t i n c o n tr o l r e g i st er CR4), the m achine-check except.
Vol. 3 15-43 MACHINE-CHECK AR CHITECTURE AND PCC flag in IA32_MC i _STATUS = 1 OR RIPV flag in IA32_MCG_STATUS = 0 (* execution is not restart able *) THEN RESTARTABILITY = FALSE; return RESTARTABILITY to calling procedure; FI; Save time-stamp counter and processor ID; Set IA32_MC i _STATUS to all 0s; Execute serializing instruction (i.
15-44 Vol. 3 MACHINE-CHECK ARCHITECTUR E mechanism to indicate the frequency of ex ceptions. A multiprocessing oper - ating system stores the identit y of the process or node incurring the excep - tion using a unique identifier , such as the processor’ s APIC ID (see Section 10.
Vol. 3 15-45 MACHINE-CHECK AR CHITECTURE was corrected (UC=0) or uncorrected (UC=1). The MCE handler can optionally log and clear the corrected errors in the MC banks if it can implement software algorithm to av oid the undesired race conditions with the CMCI or CMC polling handler .
15-46 Vol. 3 MACHINE-CHECK ARCHITECTUR E AR flag to find the type of the UCR erro r for softw are recovery and determine if software error recov ery is possible.
Vol. 3 15-47 MACHINE-CHECK AR CHITECTURE • When the OVER flag in the IA32_MCi_ST A TUS regi ster is set for the SRAR error (V AL=1, UC=1, EN=1, PCC=0, S=1 and AR=1), the MCE handler cann ot take recovery action as the information of the SRAR error in the IA32_MCi_ST A TUS register was potentially lost due to the overflow condition.
15-48 Vol. 3 MACHINE-CHECK ARCHITECTUR E RESTARTABILITY = FALSE; FI FI; IF RESTARTABILITY = FALSE THEN Report RESTARTABILITY to console; Reset system; FI; IF MCA_BROADCAST = TRUE THEN IF ProcessorCoun.
Vol. 3 15-49 MACHINE-CHECK AR CHITECTURE IF PCC Flag in IA32_MCi_STATUS = 1 THEN (* processor context might have been corrupted *) RESTARTABILITY = FALSE; ELSE (* It is a uncorrected recoverable (UCR .
15-50 Vol. 3 MACHINE-CHECK ARCHITECTUR E If MISCV in IA32_MCi_STATUS THEN SAVE IA32_MCi_MISC; FI; IF ADDRV in IA32_MCi_STATUS THEN SAVE IA32_MCi_ADDR; FI; IF CLEAR_MC_BANK = TRUE THEN SET all 0 to IA3.
Vol. 3 15-51 MACHINE-CHECK AR CHITECTURE before these errors are actually handle d and processed by the MCE handler for attempted software error recov ery .
15-52 Vol. 3 MACHINE-CHECK ARCHITECTUR E.
Vol. 3 16-1 CHAP TER 16 DEBUGGING, PR O FILING BRANCHES AND TIME- S TAMP COUNTER Intel 64 and IA-32 architectures provide debug facilities for use in debugging code and monitoring performance. These facilitie s are valuable for debugging application software, system software, and multitaski ng operating systems.
16-2 Vol. 3 DEBUGGING, PR OFILING BR ANCHES AND TIME-S TAMP COUN TER instruction is an alternative w ay to set code breakpoints. It is especially useful when more than four breakpoints are de sired, or when breakpoints are being placed in the source code.
Vol. 3 16-3 DEBUGGING, PR OFILING BRANCH ES AND TIME-S TAMP COUN TER • Whether the breakpoint condition was present when the debug exception was generated. The following paragraphs describe the functions of flags and fields in the debug registers. Figur e 16-1.
16-4 Vol. 3 DEBUGGING, PR OFILING BR ANCHES AND TIME-S TAMP COUN TER 16.2.1 Debug Address R egisters (DR0-DR3) Each of the debug-address registers (DR0 through DR 3) holds the 32-bit linear address of a breakpoint (see Figure 16-1). Breakpoint comparisons are made before physical address tr anslation occurs.
Vol. 3 16-5 DEBUGGING, PR OFILING BRANCH ES AND TIME-S TAMP COUN TER exceptions, debug handlers should clear th e register before returning to the inter- rupted task. 16.2.4 Debug Con trol R egister (DR7) The debug control register (DR7) enables or disable s breakpoints and sets break- point conditions (see Figure 16-1).
16-6 Vol. 3 DEBUGGING, PR OFILING BR ANCHES AND TIME-S TAMP COUN TER 10 — Break on I/O reads or writes. 11 — Break on data reads or writes but not instruction fetches.
Vol. 3 16-7 DEBUGGING, PR OFILING BRANCH ES AND TIME-S TAMP COUN TER the lower address bits in the deb ug registers. Unaligned data or I/O breakpoint addresses do not yield valid results.
16-8 Vol. 3 DEBUGGING, PR OFILING BR ANCHES AND TIME-S TAMP COUN TER 16.2.6 Debug Re gisters and In tel ® 64 Proc essors For Intel 64 architecture processors, debug registers DR0–DR7 are 64 bits. In 16-bit or 32-bit modes (protected mode and compatibility mode), writes to a debug register fill the upper 32 bits with zeros.
Vol. 3 16-9 DEBUGGING, PR OFILING BRANCH ES AND TIME-S TAMP COUN TER 16.3 DEBUG EX CEP T IONS The Intel 64 and IA-32 architectures dedicate two interrupt vectors to handling debug exceptions: vector 1 (debug exception, #DB) and vector 3 (breakpoint excep- tion, #BP).
16-10 Vol. 3 DEBUGGING, PR OFILING BR ANCHES AND TIME-S TAMP COUN TER See also: Chapter 6, “Interrupt 1—Deb ug Exception (#DB), ” in the Intel® 64 and IA-32 Architectures Software Develope r’s Manual, Volume 3A .
Vol. 3 16-11 DEBUGGING, PR OFILING BRANCH ES AND TIME-S TAMP COUN TER (resume flag) in the EFLAGS register (see Section 2.3, “System Flags and Fields in the EFLAGS Register , ” in the Intel ® 64 and IA-32 Architec tures Software Developer’s Manual, Volume 3A ).
16-12 Vol. 3 DEBUGGING, PR OFILING BR ANCHES AND TIME-S TAMP COUN TER 16.3.1.2 Data Memory and I/O Brea kpoint E xc eption Conditions Data memory and I/O breakpoints are reported when the processor at.
Vol. 3 16-13 DEBUGGING, PR OFILING BRANCH ES AND TIME-S TAMP COUN TER single-step trap does not occur until after the instruction that follows the POPF instruction.
16-14 Vol. 3 DEBUGGING, PR OFILING BR ANCHES AND TIME-S TAMP COUN TER 16.4 LAST BRANCH, IN TERRUPT, AND EX CEP TION R ECOR DING OVERVIEW P6 family processors introduced the abilit y to set breakpoints on taken br anches, interrupts, and exceptions, and to single-step from one branch to the next.
Vol. 3 16-15 DEBUGGING, PR OFILING BRANCH ES AND TIME-S TAMP COUN TER in the last branch record (LBR) stack. For more information, see the Section 16.5.
16-16 Vol. 3 DEBUGGING, PR OFILING BR ANCHES AND TIME-S TAMP COUN TER • FREEZE_LBRS_ON_P MI flag (bit 11) — When set, the LBR stack is frozen on a hardware PMI request (e.
Vol. 3 16-17 DEBUGGING, PR OFILING BRANCH ES AND TIME-S TAMP COUN TER a bug to a particular block of code before instruction single-stepping further narrows the search. If the B TF flag is set when the processor generates a debug exception, the processor clears the BTF flag along with the TF flag.
16-18 Vol. 3 DEBUGGING, PR OFILING BR ANCHES AND TIME-S TAMP COUN TER 16.4.6 CPL -Qualified Branch T race Mechanism CPL -qualified branch tr ace mechanism is av ailable to a subset of Intel 64 and IA -32 processors that support the branch tr ace storing mechanism.
Vol. 3 16-19 DEBUGGING, PR OFILING BRANCH ES AND TIME-S TAMP COUN TER 16.4.8 LBR S tack The last branch record stack and top-of -stack (T OS) pointer MSRs are supported across Intel 64 and IA-32 processor families.
16-20 Vol. 3 DEBUGGING, PR OFILING BR ANCHES AND TIME-S TAMP COUN TER 16.4.8.1 LBR Stack and Intel ® 64 Pr ocessor s LBR MSRs are 64-bits. If IA -32e mode is disab led, only the lower 32-bits of the address is recorded. If IA-32e mode is enabled, the processor writes 64-bit v alues into the MSR.
Vol. 3 16-21 DEBUGGING, PR OFILING BRANCH ES AND TIME-S TAMP COUN TER 16.4.8.3 Last Ex ception R ecor ds and Intel 64 Ar chitecture Intel 64 and IA -32 processors also provide MSRs that store the branch record for the last branch tak en prior to an exception or an interrupt.
16-22 Vol. 3 DEBUGGING, PR OFILING BR ANCHES AND TIME-S TAMP COUN TER and is cleared on processor RESET an d INIT . DS recording is available in real address mode.
Vol. 3 16-23 DEBUGGING, PR OFILING BRANCH ES AND TIME-S TAMP COUN TER • PEBS absolute maxi mum — Linear address of the next byte past the end of the PEBS buffer .
16-24 Vol. 3 DEBUGGING, PR OFILING BR ANCHES AND TIME-S TAMP COUN TER • PEBS counter reset value — A 40-bit value that the counter is to be reset to after state information has collected following counter overflow. This v alue allows state information to be collected after a preset number of events have been counted.
Vol. 3 16-25 DEBUGGING, PR OFILING BRANCH ES AND TIME-S TAMP COUN TER 16.4.9.1 DS Save Area and IA-3 2e Mode Operation When IA-32e mode is active (IA32_EFER.LMA = 1), the structure of the DS save area is shown in Figure 16-8. The organization of each field in IA-32e mode oper ation is similar to that of non- IA-32e mode oper ation.
16-26 Vol. 3 DEBUGGING, PR OFILING BR ANCHES AND TIME-S TAMP COUN TER When IA-32e mode is activ e, the structure of a branch tr ace record is similar to that shown in Figure 16-6, but each field is 8 bytes in length. This makes each B TS record 24 bytes (see Figure 16-9).
Vol. 3 16-27 DEBUGGING, PR OFILING BRANCH ES AND TIME-S TAMP COUN TER Fields in the buffer management area of a DS save area are described in Section 16.
16-28 Vol. 3 DEBUGGING, PR OFILING BR ANCHES AND TIME-S TAMP COUN TER The procedures used to program IA32_DEBUG_CTRL MSR to set up a BTS buffer or a CPL -qualified B TS are described in Se ction 16.
Vol. 3 16-29 DEBUGGING, PR OFILING BRANCH ES AND TIME-S TAMP COUN TER • It is recommended that the buffer size for the BTS buffer and the PEBS buffer be an integer multiple of the corresponding record sizes.
16-30 Vol. 3 DEBUGGING, PR OFILING BR ANCHES AND TIME-S TAMP COUN TER 2. Set the TR and B TS flags in the IA32_DE BUGCTL for Intel Core Solo and Intel Core Duo processors or later processors (or MSR_DEBUGCTLA MSR for processors based on Intel NetBurst Microarchitecture; or MSR_DEBUGCTLB for P entium M processors).
Vol. 3 16-31 DEBUGGING, PR OFILING BRANCH ES AND TIME-S TAMP COUN TER 16.4.9.5 Writing the DS In terrupt Serv ice R outine The BT S, non-precise event-based samplin g, and PEBS facilities share the same interrupt vector and interrupt service routine (called the debug store interrupt service routine or DS ISR).
16-32 Vol. 3 DEBUGGING, PR OFILING BR ANCHES AND TIME-S TAMP COUN TER • The ISR must clear the mask bit in the performance counter L VT entry . • The ISR must re-enable the counters to count via I.
Vol. 3 16-33 DEBUGGING, PR OFILING BRANCH ES AND TIME-S TAMP COUN TER 16.5.1 LBR S tack The last branch record stack and top-of -stack (T OS) pointer MSRs are supported across Intel Core 2, Intel Xeon and Intel Atom processor families.
16-34 Vol. 3 DEBUGGING, PR OFILING BR ANCHES AND TIME-S TAMP COUN TER • Branch trace store and CPL-qualified BTS — See Section 16.4.6 and Section 16.4.5. • FREEZE_LBRS_ON_P MI flag (bit 11) — see Se cti on 16 .4 .7. • FREEZE_PERFMON_ON_PMI fla g (bit 12) — see Section 16.
Vol. 3 16-35 DEBUGGING, PR OFILING BRANCH ES AND TIME-S TAMP COUN TER Processors based on Intel microarchitecture (Nehalem) have an LBR MSR Stack as shown in T able 16-8. T able 16-8. LBR S tack Size and T OS Pointer Range 16.6.2 Filtering o f Last Br anch Rec ords MSR_LBR_SELECT is cleared to zero at RESE T , and LBR filtering is disabled, i.
16-36 Vol. 3 DEBUGGING, PR OFILING BR ANCHES AND TIME-S TAMP COUN TER 16.7 LAST BRANCH, IN TERRUPT, AND EX CEP TION R ECOR DING (PR OCESSORS BASED ON IN TEL NETBURS T ® MICR OARCHITECTUR E) Pentium 4.
Vol. 3 16-37 DEBUGGING, PR OFILING BRANCH ES AND TIME-S TAMP COUN TER • IA32_MISC_ENA BLE MSR — Indicates that the processor provides the B TS facilities.
16-38 Vol. 3 DEBUGGING, PR OFILING BR ANCHES AND TIME-S TAMP COUN TER • BTS (branch trace stor e ) f l a g ( b i t 3 ) — When set, enables the BTS facilities to log BTMs to a memory -resident BTS buffer that is part of the DS save area. See Section 16.
Vol. 3 16-39 DEBUGGING, PR OFILING BRANCH ES AND TIME-S TAMP COUN TER LBR MSR pair) that contains the most recent (last) br anch record placed on the stack. Prior to placing a new branch record on the stack, the TOS is incremented by 1. When the TOS pointer reaches it maximum value, it wraps around to 0.
16-40 Vol. 3 DEBUGGING, PR OFILING BR ANCHES AND TIME-S TAMP COUN TER Additional information is saved if an ex cept ion or interrupt occurs in conjunction with a branch instruction.
Vol. 3 16-41 DEBUGGING, PR OFILING BRANCH ES AND TIME-S TAMP COUN TER 16.8 LAST BR ANCH, IN TERRUPT, AND EX C EPTION R ECOR DING (IN TEL ® COR E ™ SOLO AND IN TEL ® COR E ™ DUO PROC ESSORS) Intel Core Solo and Intel Core Duo processors provide last branch interrupt and exception recording.
16-42 Vol. 3 DEBUGGING, PR OFILING BR ANCHES AND TIME-S TAMP COUN TER • Debug store ( DS) feature flag (b it 21), retu rned by the CPU ID instruction — Indicates that the processor provides the debug store (DS) mechanism, which allows BTMs to be st ored in a memory-resident BT S buffer .
Vol. 3 16-43 DEBUGGING, PR OFILING BRANCH ES AND TIME-S TAMP COUN TER 16.9 LAST BR ANCH, IN TERRUPT, AND EX CE PTION R ECOR DING (PEN TIUM M PR OCESSORS) Like the P entium 4 and Intel X eon processor family , P entium M processors provide last branch interrupt and exception recording.
16-44 Vol. 3 DEBUGGING, PR OFILING BR ANCHES AND TIME-S TAMP COUN TER — T R ( t r a ce m e s s a g e e n a b l e ) f l a g ( b it 6 ) — When set, branch trace messages are enabled. When the processor detects a taken branch, interrupt, or exception, it sends the branch record out on the system bus as a branch trace message (B TM).
Vol. 3 16-45 DEBUGGING, PR OFILING BRANCH ES AND TIME-S TAMP COUN TER For more detail on these capabilities, see Section 16.7.3, “Last Exception Records, ” and Appendix B.
16-46 Vol. 3 DEBUGGING, PR OFILING BR ANCHES AND TIME-S TAMP COUN TER • B T F ( s i n g l e - s t e p o n b r a n c h e s ) f l a g ( b i t 1 ) — When set, the processor treats the TF flag in the EFLAGS re gister as a “single-step on branches” flag.
Vol. 3 16-47 DEBUGGING, PR OFILING BRANCH ES AND TIME-S TAMP COUN TER tion or interrupt being ge ner ated. When an exception or interrupt occurs, the contents of the LastBranchT oIP and LastBr anchFro.
16-48 Vol. 3 DEBUGGING, PR OFILING BR ANCHES AND TIME-S TAMP COUN TER 16.11 TIME-S TAMP COUNTER The Intel 64 and IA-32 architectures (beginning with the P entium processor) define a time-stamp counter mechanism that can be used to monitor and identify the relative time occurrence of processor events.
Vol. 3 16-49 DEBUGGING, PR OFILING BRANCH ES AND TIME-S TAMP COUN TER NO TE T o dete rmine aver age processor clock frequency , Intel recommends the use of EMON logic to count processor core clocks over the period of time for which the av erage is required.
16-50 Vol. 3 DEBUGGING, PR OFILING BR ANCHES AND TIME-S TAMP COUN TER 16.11.2 IA32_TSC_AU X Register and RD TSCP Support Processor based on Intel microarchitecture (Ne halem) provides an auxiliary TSC register , IA32_TSC_AUX that is designed to be used in conjunction with IA32_TSC.
Vol. 3 17-1 CHAP TER 17 8086 EMULATION IA-32 processors (beginning with the Intel386 processor) provide two ways to execute new or legacy programs that are assembled and/or compiled to run on an Intel 8086 processor: • Real-address mode. • Virtual-8086 mode.
17-2 Vol. 3 8086 EMULA TION The following is a summary of the core features of the real-address mode execution environment as would be seen by a program written for the 8086: • The processor supports a nominal 1-MByte physical address space (see Section 17.
Vol. 3 17-3 8086 EMULA TION • A single interrupt table, called the “interrupt vector table” or “interrupt table, ” is provided for handling interrupts and exceptions (see Figure 17-2 ).
17-4 Vol. 3 8086 EMULA TION in real-address mode, however , the processor does not truncate such an address and uses it as a physical address. (Note, however , that for IA-32 processors beginning with.
Vol. 3 17-5 8086 EMULA TION • Move (MOV) instructions that move operands between general-purpose registers, segment registers, and between memory and general-purpose registers. • The exchange (XCHG) instruction. • Load segment register instructions LDS and LES .
17-6 Vol. 3 8086 EMULA TION • Bit test and bit scan instructions B T , B TS, B TR, B T C, BSF , and BSR; the byte-set - on condition instruction SET c c; and the byte swap (BSW AP) instruction. • Double shift instruct ions SHLD and SHRD . • EFLAGS control instructions PUSHF and POPF .
Vol. 3 17-7 8086 EMULA TION The interrupt vector table is an array of 4-byte entries (see Figure 17-2). Each entry consists of a far pointer to a handler proc edure, made up of a segment selector and an offset. The processor scales the interrupt or exception vector by 4 to obtain an offset into the interrupt table.
17-8 Vol. 3 8086 EMULA TION 17 .2 VIRTUAL-8086 MODE Virtual-8086 mode is actually a special type of a task that runs in protected mode. When the operating-system or ex ecutive sw itches to a virtual-8086-mode task, the processor emulates an Intel 8086 proce ssor .
Vol. 3 17-9 8086 EMULA TION 17 .2.1 Enabling Virtual-8086 Mode The processor runs in virtual-8086 mode when the VM (virtual machine) flag in the EFLAGS register is set. This flag can only be set when the processor switches to a new protected-mode task or resume s virtual-8086 mode via an IRET instruction.
17-10 Vol. 3 8086 EMULA TION The processor enters virtual-8086 mode to run the 8086 program and returns to protected mode to run the virtual-8 086 monitor .
Vol. 3 17-11 8086 EMULA TION Paging is not necessary for a single virtual-8086-mode task, but paging is useful or necessary in the following situations: • When running multiple virtual-8086-mode tasks.
17-12 Vol. 3 8086 EMULA TION When a task switch is used to enter virtual-80 86 mode, the TSS for the virtual-8086- mode task must be a 32-bit TSS. (If the ne w TSS is a 16-bit TSS, the upper word of the EFLAGS register is not in the TS S, causing the processor to clear the VM flag when it loads the EFLAGS register .
Vol. 3 17-13 8086 EMULA TION Figure 17-3. Entering and Leaving Virtual-80 86 Mode Monitor Virtual-8086 Real Mo de Code Pro tected- Mode T asks Virtual-8086 Mode T asks (8086 Programs) Protected- Mode .
17-14 Vol. 3 8086 EMULA TION 17 .2.6 Leaving Virtual-8086 Mode The processor can leave the virtual-8086 mode only through an interrupt or excep - tion.
Vol. 3 17-15 8086 EMULA TION execution sequence after verifying that it was entered as a result of a HL T execution. See Section 17.3, “ Interrupt and Exception Handling in Virtual-8086 Mode”, for infor - mation on leaving virtual-8086 mode to handle an interrupt or exception generated in virtual-8086 mode.
17-16 Vol. 3 8086 EMULA TION for another task. This differs from protected mode in which, if the CPL is less than or equal to the IOPL, I/O access is allowed without checking the I/O permission bit map.
Vol. 3 17-17 8086 EMULA TION In virtual-8086 mode, the interrupts and exceptions are divided into three classe s for the purposes of handling: • Class 1 — All processor-generated exceptions and all hardware interrupts, including the NMI interrupt and the hardware interrupts sent to the processor’s external interrupt delivery pins.
17-18 Vol. 3 8086 EMULA TION in the previous paragraphs. These sections describe three possibl e types of interrupt and exception handlers: • Protected-mode interrupt and exceptions handlers — These are the standard handlers that the processor calls through the protected-mode IDT .
Vol. 3 17-19 8086 EMULA TION save and restore these registers regardless of the type segment selectors they contain (protected-mode or 8086-style). The interrupt and exception handlers, which may be c.
17-20 Vol. 3 8086 EMULA TION Interrupt and exception handlers can examin e the VM flag on the stack to determine if the interrupted procedure was running in v irtual-8086 mode.
Vol. 3 17-21 8086 EMULA TION 2. Store the EFLAGS (low-order 16 b its only), CS and EIP v alues of the 8086 program on the privilege-level 3 stack. This is the stack that the virtual-8086- mode task is using. (The 8086 handler may use or modify this information.
17-22 Vol. 3 8086 EMULA TION executed must be 0, otherwise the proce ssor does not change the state of the V M flag. 17 .3.2 Class 2—Maskable Har d war e Interrup t Handling in Virtual-8086 Mode Usi.
Vol. 3 17-23 8086 EMULA TION CLI instruction, the processor clears the VIF flag to request that the virtual-8086 monitor inhibit maskable hardw are interrupts from interrupting pro gram ex ecution; wh.
17-24 Vol. 3 8086 EMULA TION 5. Upon returning to virtual-8086 mode, th e processor continues execution of the 8086 program. When the 8086 program is ready to receive maskable hardware interrupts, it executes the STI instruction to set the VIF flag (enabling maskable hardware interrupts).
Vol. 3 17-25 8086 EMULA TION tions in virtual-8086 mode in the same manner as an Intel386 or Intel486 processor does. When this flag is set, the virtual mode extension provides the following enhanceme.
17-26 Vol. 3 8086 EMULA TION T able 17-2. Softw are In terrup t Handli ng Methods While in Virtual-8086 Mode Method VME IOPL Bit in Red i r. Bitmap* Proc essor Action 1 0 3 X In terrupt dir ected to a.
Vol. 3 17-27 8086 EMULA TION Redirecting software interrupts back to the 8086 program potentially speeds up interrupt handling because a switch back and forth between virtual-8086 mode and protected mode is not required.
17-28 Vol. 3 8086 EMULA TION rupt handler in the protected-mode IDT pointed to by the interrupt vector . See Section 17.3.1, “Class 1—Hardware Interrupt and Exception Handling in Virtual-8086 Mode” , for a complete description of this mechanism and its possible uses.
Vol. 3 17-29 8086 EMULA TION 3. Clears the IF fla g in the EFLAGS register to disable interrupts. 4. Clears the TF flag, in the EFLAGS register . 5. Locates the 8086 program interrupt vector table at linear address 0 for the 8086- mode task.
17-30 Vol. 3 8086 EMULA TION cient means of handling maskable hardware interrupts that occur during a virtual- 8086 mode task. Also, because the IOPL v alue is less than 3 and the VIF flag is enabled,.
Vol. 3 17-31 8086 EMULA TION It is only possible to enter virtual-8086 mode through a task switch or the execution of an IRET instruction, and it is only po ssible to leave virtual-8086 mode by faulting to a protected-mode interrupt handle r (typ ically the general-pro tection exception handler , which in turn calls the virtual 8086-mode monitor).
17-32 Vol. 3 8086 EMULA TION.
Vol. 3 18-1 CHAP TER 18 MIXING 16-BIT AND 32-BIT CODE Program modules written to run on IA -32 proce ssors can be either 16-bit modules or 32-bit modules. Ta b l e 1 8 - 1 shows the characteristic of 16-bit and 32-bit modules. The IA-32 processors function most efficiently when executing 32-bit program modules.
18-2 Vol. 3 MIXING 16-BIT AND 32-BIT CODE 18.1 DEFINING 16-BIT AN D 32-BIT PR OGR AM MODULES The following IA-32 architecture mechanis ms are used to distinguish between and support 16-bit and 32-bit segments and operations: • The D (default operand and address size ) flag in code-segment descriptors.
Vol. 3 18-3 MIXING 16-BIT AND 32-BIT CODE These prefixes reverse the default size sele cted by the D flag in the code-segme nt descriptor . For example, the processor can interpret the (MOV mem , reg ) instruction in any of f our ways: • In a 32-bit code segment: — Moves 32 bits from a 32-bit register to memory using a 32-bit effective address.
18-4 Vol. 3 MIXING 16-BIT AND 32-BIT CODE 18.3 SHARING DATA AMONG MIX ED-SIZE CODE SEGMENTS Data segments can be accessed from both 16-bit and 32-bit code segments.
Vol. 3 18-5 MIXING 16-BIT AND 32-BIT CODE Likewise, there are three ways for procedure in a 32-bit code segment to safely make a call to a 16-bit code segment: • Make the call through a 16-bit call gate. Here, the EIP v alue at the CALL instruction cannot exceed FFFFH.
18-6 Vol. 3 MIXING 16-BIT AND 32-BIT CODE instruction (see Figure 18-1 ). On a 16-bit call, the processor pushes the contents of the 16-bit IP register and (for calls betwee n privilege levels) the 16-bit SP register .
Vol. 3 18-7 MIXING 16-BIT AND 32-BIT CODE While executing 32-bit code, if a call is made to a 16-bit code segment which is at the same or a more privileged level (that is, the DPL of the called code s.
18-8 Vol. 3 MIXING 16-BIT AND 32-BIT CODE segments can be modified to safely call procedures to 32-bit code segments in either of two ways: • R elink the CALL instruction to point to 32-bit call gates (see Section 18.4.2 .2, “Passing P ar ameters With a Gate” ).
Vol. 3 18-9 MIXING 16-BIT AND 32-BIT CODE 18.4.5 Writing In terface Pr ocedur es Placing interface code between 32-bit and 16- bit procedures can be the solution to the following interface problems: • Allowing procedures in 16-bit code segments to call procedures with offsets greater than FFFFH in 32-bit code segments.
18-10 Vol. 3 MIXING 16-BIT AND 32-BIT CODE.
Vol. 3 19-1 CHAP TER 19 ARCH ITECTUR E COMPATIBILITY Intel 64 and IA-32 processors are binary compatible. Compatibility means that, within limited constraints, progr ams that execute on previous generations of proces - sors will produce identical results when executed on later processors.
19-2 Vol. 3 ARCHITEC TURE COMPA TIBILITY • Pentium D Processors — A family of dual-core In tel 64 processors that provides two processor cores in a physical package.
Vol. 3 19-3 ARCHITECTU RE COMPA TIBILITY original value results in a general-pro tec tion exception (#GP). So, programs that execute on the P6 family and Pentium processors cannot erroneously enable func - tions that may be implemented in future IA -32 processors.
19-4 Vol. 3 ARCHITEC TURE COMPA TIBILITY control and status register . These instructions and registers a re designed to allow SIMD computations to be made on single -precision floating-point numbers. Sever al of these new instructions also operate in the MMX registers.
Vol. 3 19-5 ARCHITECTU RE COMPA TIBILITY 19.10 INTEL HYPER-THREADING T ECHNOLOGY Intel Hyper- Threading T echnology provides two logical processors that can execute two separate code streams (called threads ) concurrently by using shared resources in a single processor core or in a ph ysical package.
19-6 Vol. 3 ARCHITEC TURE COMPA TIBILITY 19.13.1 Instructions Added Prior to the P entium Pr ocessor The following instructions were added in the Intel486 processor: • BSW AP (byte swap) instruction. • XADD (exchange and add) instruction. • CMPXCHG (compare and exchange) instruction.
Vol. 3 19-7 ARCHITECTU RE COMPA TIBILITY • Single-bit instructions. • Bit scan instructions. • Double-shift instructions. • Byte set on condition instruction. • Move with sign/zero extension. • Generalized multiply instruction. • MOV to and from control registers.
19-8 Vol. 3 ARCHITEC TURE COMPA TIBILITY The following flags were added to the EFLAGS register in the P entium processor: • VIF (virtual interrupt flag), bit 19. • VIP (virtual interrupt pending), bit 20. • ID (identification flag), bit 21. The AC flag (bit 18 ) was added to the EFLAGS register in the Intel486 processor .
Vol. 3 19-9 ARCHITECTU RE COMPA TIBILITY XCHG BP, [BP ] This code functions as the 8086 processor PUSH SP instruction on the P6 family , Pentium, Intel486, Intel386, and Intel 286 processors.
19-10 Vol. 3 ARCHITEC TURE COMPA TIBILITY math coprocessor (flag is clear) or an Intel 387 DX math coprocessor (flag is set). This bit is hardwired to 1 in the P6 family , P entium, and Intel486 proce ssors.
Vol. 3 19-11 ARCHITECTU RE COMPA TIBILITY On the 32-bit x87 FPUs, the C2 flag serves as an incomplete flag for the F T AN instruc - tion. On the 16-bit IA -32 math coprocessors , the C2 flag is undefined for the FPT AN instruction.
19-12 Vol. 3 ARCHITEC TURE COMPA TIBILITY Software written to run on a 16-bit IA -32 math coprocessor may not oper ate correctly on a 16-bit x87 FPU, if it us es the FLDENV , FRSTOR, or FXRST OR instruc - tions to change tags to values (other than to empty) that are different from actual register contents.
Vol. 3 19-13 ARCHITECTU RE COMPA TIBILITY ters. The only affect may be in how softw a re handles the tags in the tag word (see also: Section 19.18. 4, “x87 FPU T ag W ord” ).
19-14 Vol. 3 ARCHITEC TURE COMPA TIBILITY The difference is apparent only to the exception handler . This difference is for IEEE Standard 754 compatibility .
Vol. 3 19-15 ARCHITECTU RE COMPA TIBILITY the 8087 interrupt, both exception vectors should call the floating-point-error excep - tion handler . Some instructions in a floa ting-point-error exception handler ma y need to be deleted if they use the interrupt cont roller .
19-16 Vol. 3 ARCHITEC TURE COMPA TIBILITY 19.18.6.9 Alignment Check Ex ceptions (#AC) If alignment checking is enabled, a mi saligned data operand on the P6 family , Pentium, and Intel486 processors c.
Vol. 3 19-17 ARCHITECTU RE COMPA TIBILITY 19.18.7 Changes to Floating-Poin t Instructions This section identifies the differences in floating-point instructions for the v arious Intel FPU and math coprocessor architectures, the reason for the differences, and their impact on software.
19-18 Vol. 3 ARCHITEC TURE COMPA TIBILITY tions do not exist on the 16-bit IA-32 math coprocessors. The availability of these new instructions has no impact on existing softw are.
Vol. 3 19-19 ARCHITECTU RE COMPA TIBILITY arithmetic. The 16-bit IA-32 math coproc essors do report a denormal-oper and exception in this situation. This difference does not affect existing software. On the 32-bit x87 FPUs, loading a denormal v alue that is in single- or double-real format causes the v alue to be conv erted to extended-real format.
19-20 Vol. 3 ARCHITEC TURE COMPA TIBILITY FPUs handle all addressing and exception- pointer information, whether in protected mode or not. 19.18.7 .15 FXAM Instruction With the 32-bit x87 FPUs, if the FPU enco unters an empty register when executing the FXAM instruction, it not generate combin ations of C0 through C3 equal to 1101 or 1111.
Vol. 3 19-21 ARCHITECTU RE COMPA TIBILITY 19.18.10 W A IT /FWAIT Pre fix Diff erenc es On the Intel486 processor , when a WAIT/FW A IT instruction precedes a floating-point instruction (one which itself automatically synchronizes with the previous floating- point instruction), the W AIT/FWAIT instruction is treated as a no-op.
19-22 Vol. 3 ARCHITEC TURE COMPA TIBILITY 19.20 FPU AND MATH COPROC ESSOR INITIALIZATION T able 9-1 shows the states of the FPUs in th e P6 family , Pentium, Intel486 processors and of the Intel 387 math coprocessor and Intel 287 coprocessor following a power- up, reset, or INIT , or following the execution of an FINIT/FNINIT instruction.
Vol. 3 19-23 ARCHITECTU RE COMPA TIBILITY Following is an example code sequence to initialize the system and check for the presence of Intel486 SX processor/Intel 487 SX math coprocessor .
19-24 Vol. 3 ARCHITEC TURE COMPA TIBILITY 19.21 CON TROL R EGISTERS The following sections identify the new control registers and control register flags and fields that were introduced to the 32 -bit IA-32 in v arious processor families. See Figure 2-6 for the location of these flags and fields in the control registers.
Vol. 3 19-25 ARCHITECTU RE COMPA TIBILITY • NE — Numeric error . Enables the normal mechanism for reporting floating-point numeric errors. • WP — Write protect. W rite-protects read-only pages against sup ervisor-mode accesses. • AM — Alignment mask.
19-26 Vol. 3 ARCHITEC TURE COMPA TIBILITY 19.22.1.2 Global Pages The new PGE (pag e global enable) flag i n control register CR4, bit 7, pr ovides a mechanism for preventing frequently used pages from being flushed from the tr ans- lation lookaside buffer (TLB).
Vol. 3 19-27 ARCHITECTU RE COMPA TIBILITY 19.22.4 Changes in Segmen t Descriptor L oads On the Intel386 processor , loading a segment descriptor always causes a lock ed read and write to set the accessed bit of the de scriptor .
19-28 Vol. 3 ARCHITEC TURE COMPA TIBILITY are enabled (the DE flag is set), attempts to reference registers DR4 or DR5 will result in an invalid-opcode exception (#UD).
Vol. 3 19-29 ARCHITECTU RE COMPA TIBILITY may not be implemented or implemented differently in future processors. The MCE flag in control register CR4 enables the machine-check exception. When this bit is clear (which it is at reset), the processor inhibits generation of the machine- check exception.
19-30 Vol. 3 ARCHITEC TURE COMPA TIBILITY 19.25.1 Machine-Check Architectur e The P entium Pro processor introduced a ne w architecture to the IA -32 for handling and reporting on machine-check exceptions.
Vol. 3 19-31 ARCHITECTU RE COMPA TIBILITY 19.26.3 IDT Limit The LIDT instruction can be used to set a limit on the size of the IDT . A double-fault exception (#DF) is generated if an interrupt or exception attempts to read a vector b e y o n d t h e l i m i t .
19-32 Vol. 3 ARCHITEC TURE COMPA TIBILITY • The remote read delivery mode provided in the 82489DX and local APIC for P entium processors is not supported in the local APIC in the Pentium 4, Intel X eon, and P6 family processors.
Vol. 3 19-33 ARCHITECTU RE COMPA TIBILITY 19.28.1 P6 F amily and Pentium Pr ocessor TSS When the virtual mode extensions are enabled (by setting the VME flag in control register CR4), the TSS in the P.
19-34 Vol. 3 ARCHITEC TURE COMPA TIBILITY than 0DFFFH, the Intel486 processor will not wrap around and access incorrect loca - tions within the TSS for I/O port v alidation and the P6 family and Pentium processors will not experience general-protection exceptions (#GP).
Vol. 3 19-35 ARCHITECTU RE COMPA TIBILITY data cache and L2 cache of the P6 family processors. In the Intel486 processor , setting these flags to (00B) enables write-through for the cache. External system hardware can force the Pentium processor to disable caching or to use the write-through cache policy should th at be required.
19-36 Vol. 3 ARCHITEC TURE COMPA TIBILITY 19.29.2 Disabling the L3 Cache A unified third-level (L3) cache in processors based on Intel NetBurst microarchitec - ture (see Section 11. 1, “Internal Caches , TLBs, and Buf fers” ) provides the third-lev el cache disable flag, bit 6 of the IA32_M ISC_ENABLE MSR.
Vol. 3 19-37 ARCHITECTU RE COMPA TIBILITY 19.30.3 Enabling and Disabling P aging Paging is enabled and disabled by loading a value into control register CR0 that modi - fies the PG flag.
19-38 Vol. 3 ARCHITEC TURE COMPA TIBILITY • The initial stack pointer is FFFCH (32-bit operand) or FFFEH (16-bit operand) and will wrap around to 0H as a result of the POP operation.
Vol. 3 19-39 ARCHITECTU RE COMPA TIBILITY 19.32 MIXING 16- AND 32-BIT SEGMENTS The features of the 16-bit Intel 286 processor are an object-code compatible subset of those of the 32-bit IA -32 processors.
19-40 Vol. 3 ARCHITEC TURE COMPA TIBILITY 19.33.1 Segment Wr aparound On the 8086 processor , an attempt to access a memory operand that crosses offset 65,535 or 0FFFFH or offset 0 (for example, moving a word to offset 65,535 or pushing a word when the stack pointer is set to 1) causes the offset to wrap around modulo 65,536 or 010000H.
Vol. 3 19-41 ARCHITECTU RE COMPA TIBILITY with the exception of “fast string” store oper ations (see Section 8.2.4, “Out-of-Order Stores For String Oper ations” ). The Pentium processor has two store buffers, one corresponding to each of the pipe - lines.
19-42 Vol. 3 ARCHITEC TURE COMPA TIBILITY memory . If the access does split across a cache line, it locks the bus and accesses system memory . I/O reads are never reordered in front of buffered memory writes on an IA -32 processor . This ensures an update of all memory locations before reading the status from an I/O device.
Vol. 3 19-43 ARCHITECTU RE COMPA TIBILITY sors. The following sections describe these model-specific extensions. The CPUID instruction indicates the availability of some of the model-specific features.
19-44 Vol. 3 ARCHITEC TURE COMPA TIBILITY Earlier IA-32 processors (such as the Intel486 and Pentium processors) used the KEN# (cache enable) pin and external logic to maintain an external memory map and signal cacheable accesses to the processor .
Vol. 3 19-45 ARCHITECTU RE COMPA TIBILITY The performance-monitoring counters are useful for debugging programs, optimizing code, diagnosing system failures, or refining hardware designs. See Chapter 30, “P erformance Monitoring, ” for more information on these counters.
19-46 Vol. 3 ARCHITEC TURE COMPA TIBILITY.
An important point after buying a device Intel 253668-032US (or even before the purchase) is to read its user manual. We should do this for several simple reasons:
If you have not bought Intel 253668-032US yet, this is a good time to familiarize yourself with the basic data on the product. First of all view first pages of the manual, you can find above. You should find there the most important technical data Intel 253668-032US - thus you can check whether the hardware meets your expectations. When delving into next pages of the user manual, Intel 253668-032US you will learn all the available features of the product, as well as information on its operation. The information that you get Intel 253668-032US will certainly help you make a decision on the purchase.
If you already are a holder of Intel 253668-032US, but have not read the manual yet, you should do it for the reasons described above. You will learn then if you properly used the available features, and whether you have not made any mistakes, which can shorten the lifetime Intel 253668-032US.
However, one of the most important roles played by the user manual is to help in solving problems with Intel 253668-032US. Almost always you will find there Troubleshooting, which are the most frequently occurring failures and malfunctions of the device Intel 253668-032US along with tips on how to solve them. Even if you fail to solve the problem, the manual will show you a further procedure – contact to the customer service center or the nearest service center