Forum |  HardWare.fr | News | Articles | PC | S'identifier | S'inscrire | Shop Recherche
1926 connectés 

 

 

L'évolution de votre machine...




Attention si vous cliquez sur "voir les résultats" vous ne pourrez plus voter
Les invités peuvent voter

 Mot :   Pseudo :  
  Aller à la page :
 
 Page :   1  2  3  4  5  ..  236  237  238  ..  988  989  990  991  992  993
Auteur Sujet :

[Topic Unique] Processeurs AMD Bulldozer FX-8100/6100/4100 (32nm)

n°7877861
Marc
Chasseur de joce & sly
Posté le 25-04-2011 à 18:50:25  profilanswer
 

Reprise du message précédent :

Gigathlon a écrit :


Pas vraiment, puisque chez Intel c'est en fonction de la température... ceci dit ça ne change pas grand chose [:yamusha]
 
Le fait de se baser sur la conso est intéressant si il n'y a pas de hot spot "sévère", alors que se baser sur la température est généralement biaisé à cause du même problème de hot spot (car c'est là qu'on place généralement la sonde).


Gné ? :heink:

mood
Publicité
Posté le 25-04-2011 à 18:50:25  profilanswer
 

n°7877867
Wirmish
¡sıɹdɹns zǝɹǝs snoʌ
Posté le 25-04-2011 à 18:51:55  profilanswer
 

AMD Family 15h version 1  AND version 2:
 
 Page 20: AMD Family 15h processors have multiple compute units, each containing its own L2 cache and two cores. The cores share their compute unit’s L2 cache. Each core incorporates the complete x86 instruction set logic and L1 data cache. Compute units share the processor’s L3 cache and Northbridge (see Chapter 2, Microarchitecture of AMD Family 15h Processors).
 
 Page 23: AMD Instruction Set Enhancements - The AMD Family 15h processor has been enhanced with the following new instructions:
 • XOP and AVX support—Extended Advanced Vector Extensions provide enhanced instruction encodings and non-destructive operands with an extended set of 128-bit (XMM) and 256-bit (YMM) media registers
 • FMA instructions—support for floating-point fused multiply accumulate instructions
 • Fractional extract instructions—extract the fractional portion of vector and scalar single-precision and double-precision floating-point operands
 • Support for new vector conditional move instructions.
 • VPERMILx instructions—allow selective permutation of packed double- and single-precision floating point operands
 • VPHADDx/VPSUBx—support for packed horizontal add and substract instructions
 • Support for packed multiply, add and accumulate instructions
 • Support for new vector shift and rotate instructions
 
 Page 23: AMD Family 15h processors add support for 128-bit floating-point execution units. As a result, the throughput of both single-precision and double-precision floating-point SIMD vector operations has improved by 2X over the previous generation of AMD processors.
 
 Page 25: Instruction Fetching Improvements - While previous AMD64 processors had a single 32-byte fetch window, AMD Family 15h processors have two 32-byte fetch windows, from which four μops can be selected. These fetch windows, when combined with the 128-bit floating-point execution unit, allow the processor to sustain a fetch/dispatch/retire sequence of four instructions per cycle.
 
 Page 26: Several integer and floating-point instructions have improved latencies and decode types on AMD Family 15h processors.
 
 Page 26: Current AMD Family 15h processors support two SIMD logical/shuffle units, one in the FMUL pipe and another in the FADD pipe, while previous AMD64 processors have only one SIMD logical/shuffle unit in the FMUL pipe. As a result, the SIMD shuffle instructions can be processed at twice the previous bandwidth on AMD Family 15h processors. Furthermore, the PSHUFD and SHUFPx shuffle instructions are now DirectPath instructions instead of VectorPath instructions on AMD Family 15h processors and take advantage of the 128-bit floating point execution units. Hence, these instructions get a further 2X boost in bandwidth, resulting in an overall improvement of 4X in bandwidth compared to the previous generation of AMD processors.
 
 Page 26: Notable Performance Improvements - Several enhancements to the AMD64 architecture have resulted in significant performance improvements in AMD Family 15h processors, including:
 • Improved performance of shuffle instructions
 • Improved data transfer between floating-point registers and general purpose registers
 • Improved floating-point register to floating-point register moves
 • Optimization of repeated move instructions
 • More efficient PUSH/POP stack operations
 • 1-Gbyte paging
 
 Page 30: Key Microarchitecture Features - AMD Family 15h processors include many features designed to improve software performance. The internal design, or microarchitecture, of these processors provides the following key features:
 • Integrated DDR3 memory controller with memory prefetcher
 • 64-Kbyte L1 instruction cache and 16-Kbyte L1 data cache
 • Shared L2 cache between cores of compute unit
 • Shared L3 cache compute units on chip (for supported platforms)
 • 32-byte instruction fetch
 • Instruction predecode and branch prediction during cache-line fills
 • Decoupled prediction and instruction fetch pipelines
 • Four-way AMD64 instruction decoding (This is a theoretical limit.)
 • Dynamic scheduling and speculative execution
 • Two-way integer execution
 • Two-way address generation
 • Two-way 128-bit wide floating-point execution
 • Legacy single-instruction multiple-data (SIMD) instruction extensions, as well as support for XOP, FMA4, VPERMILx, and Advanced Vector Extensions (AVX).
 • Superforwarding
 • Prefetch into L2 or L1 data cache
 • Deep out-of-order integer and floating-point execution
 • HyperTransport™ technology
 
 Page 30: Microarchitecture of AMD Family 15h Processors - AMD Family 15h processors implement the AMD64 instruction set by means of macro-ops (the primary units of work managed by the processor) and micro-ops (the primitive operations executed in the processor's execution units). These are simple fixed-length operations designed to include direct support for AMD64 instructions and adhere to the high-performance principles of fixed-length encoding, regularized instruction fields, and a large register set. This enhanced microarchitecture enables higher processor core performance and promotes straightforward extensibility for future designs.
 
 Page 31: Superscalar Processor - The AMD Family 15h processors are aggressive, out-of-order, four-way superscalar AMD64 processors. They can theoretically fetch, decode, and issue up to four AMD64 instructions per cycle using decoupled fetch and branch prediction units and three independent instruction schedulers, consisting of two integer schedulers and one floating-point scheduler. These processors can fetch 32 bytes per cycle and can scan two 16-byte instruction windows for up to four micro-ops, which can be dispatched together in a single cycle. However, this is a theoretical limit.
 
 Page 33:
 L1 Instruction Cache - The out-of-order execution engine of AMD Family 15h processors contains a 64-Kbyte, 2-way setassociative L1 instruction cache. Each line in this cache is 64 bytes long. However, only 32 bytes are fetched in every cycle.
 L1 Data Cache - The AMD Family 15h processor contains a 16-Kbyte, 4-way predicted L1 data cache with two 128-bit ports. This is a write-through cache that supports up to two 128 Byte loads per cycle.
 L2 Cache - The AMD Family 15h processor has one shared L2 cache per compute unit. This full-speed on-die L2 cache is mostly inclusive relative to the L1 cache. The L2 is a write-through cache.
 L3 Cache - The AMD Family 15h processor supports a maximum of 8MB of L3 cache per die, distributed among four L3 sub-caches which can each be up to 2MB in size.
 
 Page 35: The scheduling for integer operations is fully data-dependency driven; proceeding out-of-order based on the validity of source operands and the availability of execution resources. Since the Bulldozer core implements a floating point co-processor model of operation, most scheduling and execution decisions of floating-point operations are handled by the floating point unit.
 
 Page 37: Floating-Point Unit - The AMD Family 15h processor floating point unit (FPU) was designed to provide four times the raw FADD and FMUL bandwidth as the original AMD Opteron and Athlon 64 processors. It achieves this by means of two 128-bit fused multiply-accumulate (FMAC) units which are supported by a 128-bit high-bandwidth load-store system. The FPU is a coprocessor model that is shared between the two cores of one AMD Family 15h compute unit. As such it contains its own scheduler, register files and renamers and does not share them with the integer units. This decoupling provides optimal performance of both the integer units and the FPU. In addition to the two FMACs, the FPU also contains two 128-bit integer units which perform arithmetic and logical operations on AVX, MMX and SSE packed integer data.
 
 Integrated Memory Controller - AMD Family 15h processors provide integrated low-latency, high-bandwidth DDR3 memory controllers. The memory controller supports:
 • DRAM chips that are 4, 8, and 16 bits wide within a DIMM.
 • Interleaving memory within DIMMs.
 • ECC checking with single symbol correcting and double symbol detecting.
 • Dual-independent 64-bit channel operation.
 • Optimized scheduling algorithms and access pattern predictors to improve latency and achieved bandwidth, particularly for interleaved streams of read and write DRAM accesses.
 • A data prefetcher.
 
 Page 40: HyperTransport3 increases the aggregate link bandwidth to a maximum of 25.6 Gbyte/s (16-bit link).
 
 Page 167: AMD Family 15h processors with 128-bit multipliers and adders achieve better throughput using SIMD instructions. (Double precision throughput is 2× and single precision is 4× the throughput of x87.) ... The SIMD instructions provide a theoretical single-precision peak throughput of four additions and four multiplications per clock cycle, whereas x87 instructions can only sustain one addition and one multiplication per clock cycle. The double-precision peak throughput of the SIMD instructions is two additions and two multiplications per clock cycle.

n°7877883
thevv27
Posté le 25-04-2011 à 18:57:02  profilanswer
 

Wirmish a écrit :

[:orly2]  
 
 
Le fake du jour... [:keuah] ... ou pas ? Je vous laisse décider.    
http://news.mydrivers.com/Img/20110425/03470118.jpg


 

vincentchandra a écrit :


 :pt1cable: Impossible
Mais si c'était vrai....  :ouch:  
 
 


 
 
J'aimerais tellement que ce soit vrai* (et **)
 
* : même si ça semble impossible  :pt1cable:  
** : mais si c'est vrai  [:nico54]  
 
 ;)  

n°7877905
Wirmish
¡sıɹdɹns zǝɹǝs snoʌ
Posté le 25-04-2011 à 19:05:36  profilanswer
 

1> AMD Family 15h processors include many features designed to improve software performance. The
 internal design, or microarchitecture, of these processors provides the following key features:

 • Integrated DDR3 memory controller with memory prefetcher
 • 64-Kbyte L1 instruction cache and 16-Kbyte L1 data cache
 • Shared L2 cache between cores of compute unit
 • Shared L3 cache compute units on chip (for supported platforms)
 • 32-byte instruction fetch
 • Instruction predecode and branch prediction during cache-line fills
 • Decoupled prediction and instruction fetch pipelines
 • Four-wayAMD64 instruction decoding (This is a theoretical limit. See section 2.3 on page 31.)
 • Dynamic scheduling and speculative execution
 • Two-way integer execution
 • Two-way address generation
 • Two-way 128-bit wide floating-point execution
 • Legacy single-instruction multiple-data (SIMD) instruction extensions, as well as support for
    XOP, FMA4, VPERMILx, and Advanced Vector Extensions (AVX).
 • Superforwarding
 • Prefetch into L2 or L1 data cache
 • Deep out-of-order integer and floating-point execution
 • HyperTransport™ technology
 
Several enhancements to the AMD64 architecture have resulted in significant performance
improvements in AMD Family 15h processors, including:

• Improved performance of shuffle instructions
• Improved data transfer between floating-point registers and general purpose registers
• Improved floating-point register to floating-point register moves
• Optimization of repeated move instructions
• More efficient PUSH/POP stack operations
• 1-Gbyte paging
• Load-Execute Instructions for Unaligned Data
   Use load-execute instructions instead of discrete load and execute instructions when performing
   SIMD integer, SIMD floating-point and x87 computations on floating-point source operands. This is
   recommended regardless of the alignment of packed data on AMD Family 15h processors. (The use
   of load-execute instructions under these circumstances was only recommended for aligned packed
   data on the previous AMD64 processors.) This replacement is only possible if the misaligned
   exception mask (MM) is set.This optimization can be especially useful in vectorized
   SIMD loops and may eliminate the need for loop peeling due to nonalignment.

 
 
2> AMD Family 15h Processor Cache Operations
 
AMD Family 15h processors use four different caches to accelerate instruction execution and data
processing:

• L1 instruction cache
• L1 data cache
• Share compute unit L2 cache
• Shared on chip L3 cache (on supported platforms)
 
2.1> L1 Instruction Cache
The out-of-order execution engine of AMD Family 15h processors contains a 64-Kbyte, 2-way set
associative L1 instruction cache. Each line in this cache is 64 bytes long. However, only 32 bytes
are fetched in every cycle. Functions associated with the L1 instruction cache are instruction loads,
instruction prefetching, instruction predecoding, and branch prediction. Requests that miss in the L1
instruction cache are fetched from the L2 cache or, subsequently, from the L3 cache or system
memory.
 
On misses, the L1 instruction cache generates fill requests to a naturally aligned 64-byte line
containing the instructions and the next sequential line of bytes (a prefetch). Because code typically
exhibits spatial locality, prefetching is an effective technique for avoiding decode stalls. Cache-line
replacement is based on a least-recently-used replacement algorithm.
 
Predecoding begins as the L1 instruction cache is filled. Predecode information is generated and
stored alongside the instruction cache. This information is used to help efficiently identify the
boundaries between variable length AMD64 instructions
 
2.2> L1 Data Cache
The AMD Family 15h processor contains a 16-Kbyte, 4-way predicted L1 data cache with two 128-
bit ports. This is a write-through cache that supports up to two 128 Byte loads per cycle. It is divided
into 16 banks, each 16 bytes wide. In addition, the L1 cache is protected from single bit errors through
the use of parity. There is a hardware prefetcher that brings data into the L1 data cache to avoid
misses. The L1 data cache has a 4-cycle load-to-use latency. Only one load can be performed from a
given bank of the L1 cache in a single cycle.
 
2.3> L2 Cache
The AMD Family 15h processor has one shared L2 cache per compute unit. This full-speed on-die L2
cache is mostly inclusive relative to the L1 cache. The L2 is a write-through cache. Every time a store
is performed in a core, that address is written into both the L1 data cache of the core the store belongs
to and the L2 cache (which is shared between the two cores). The L2 cache has an 18-20 cycle load to
use latency.
 
Size and associativity of the AMD Family 15h processor L2 cache is implementation dependent. See
the appropriate BIOS and Kernel Developer's Guide for details.
 
2.4> L3 Cache
The AMD Family 15h processor supports a maximum of 8MB of L3 cache per die, distributed among
four L3 sub-caches which can each be up to 2MB in size. The L3 cache is considered a non-inclusive
victim cache architecture optimized for multi-core AMD processors. Only L2 evictions cause
allocations into the L3 cache. Requests that hit in the L3 cache can either leave the data in the L3
cache—if it is likely the data is being accessed by multiple cores—or remove the data from the L3
cache (and place it solely in the L1 cache, creating space for other L2 victim/copy-backs), if it is likely
the data is only being accessed by a single core. Furthermore, the L3 cache of the AMD Family 15h
processor also features a number of micro-architectural improvements that enable higher
bandwidth.
 
 
3> Branch-Prediction
 
To predict and accelerate branches, AMD Family 15h processors employ a combination of nextaddress
logic, a 2-level branch target buffer (BTB) for branch identification and direct target
prediction, a return address stack used for predicting return addresses, an indirect target predictor for
predicting indirect jump and call addresses, a hybrid branch predictor for predicting conditional
branch directions, and a fetch window tracking structure (BSR). Predicted-taken branches incur a 1-
cycle bubble in the branch prediction pipeline when they are predicted by the L1 BTB, and a 4-cycle
bubble in the case where they are predicted by the L2 BTB. The minimum branch misprediction
penalty is 20 cycles in the case of conditional and indirect branches and 15 cycles for unconditional
direct branches and returns.
 
The BTB is a tagged two-level set associative structure accessed using the fetch address of the current
window. Each BTB entry includes information about a branch and its target. The L1 BTB contains
128 sets of 4 ways for a total of 512 entries, while the L2 BTB has 1024 sets of 5 ways for a total of
5120 entries.
 
The hybrid branch predictor is used for predicting conditional branches. It consists of a global
predictor, a local predictor and a selector that tracks whether each branch is correlating better with the
global or local predictor. The selector and local predictor are indexed with a linear address hash. The
global predictor is accessed via a 2-bit address hash and a 12-bit global history.
 
AMD Family 15h processors implement a separate 512- entry indirect target array used to predict
indirect branches with multiple dynamic targets.
 
In addition, the processors implement a 24-entry return address stack to predict return addresses from
a near or far call. Most of the time, as calls are fetched, the next return address is pushed onto the
return stack and subsequent returns pop a predicted return address off the top of the stack. However,
mispredictions sometimes arise during speculative execution. Mechanisms exist to restore the stack to
a consistent state after these mispredictions.
 
 
4> Instruction Fetch and Decode
 
While previous AMD64 processors had a single 32-byte fetch window, AMD Family 15h processors
have two 32-byte fetch windows, from which four μops can be selected. These fetch windows, when
combined with the 128-bit floating-point execution unit, allow the processor to sustain a
fetch/dispatch/retire sequence of four instructions per cycle. Most instructions decode to a single μop,
but fastpath double instructions decode to two μops. ALU instructions can also issue four μops per
cycle and microcoded instructions should be considered single issue. Thus, there is not necessarily a
one-to-one correspondence between the decode size of assembler instructions and the capacity of the
32-byte fetch window and the production of optimal assembler code requires considerable attention
to the details of the underlying programming constraints.
 
Assembly language programmers can now group more instructions together but must still concern
themselves with the possibility that an instruction may span a 32-byte fetch window. In this regard, it
is also advisable to align hot loops to 32 bytes instead of 16 bytes, especially in the case of loops for
large SIMD instructions.
 
AMD Family 15h processors can theoretically fetch 32B of instructions per cycle and send these
instructions to the Decode Unit (DE) in 16B windows through the 16-entry (per-thread) Instruction
Byte Buffer (IBB). The Decode Unit can only scan two of these 16B windows in a given cycle for up
to four instructions. If four instructions partially or wholly exist in more than two of these windows,
only those instructions within the first and second windows will be decoded. Aligning to 16B
boundaries is important to achieve full decode performance.
 
 
5> Integer Execution
 
The integer execution unit for the AMD Family 15h processor consists of two components:
• the integer datapath
• the instruction scheduler and retirement control
 
These two components are responsible for all integer execution (including address generation) as well
as coordination of all instruction retirement and exception handling. The instruction scheduler and
retirement control tracks instruction progress from dispatch, issue, execution and eventual retirement.
The scheduling for integer operations is fully data-dependency driven; proceeding out-of-order based
on the validity of source operands and the availability of execution resources.
 
Since the Bulldozer core implements a floating point co-processor model of operation, most
scheduling and execution decisions of floating-point operations are handled by the floating point unit.
However, the scheduler does track the completion status of all outstanding operations and is the final
arbiter for exception processing and recovery.
 
 
6> Translation-Lookaside Buffer
 
A translation-lookaside buffer (TLB) holds the most-recently-used page mapping information. It
assists and accelerates the translation of virtual addresses to physical addresses.
 
The AMD Family 15h processors utilize a two-level TLB structure.
 
6.1> L1 Instruction TLB Specifications
The AMD Family 15h processor contains a fully-associative L1 instruction TLB with 48 4-Kbyte
page entries and 24 2-Mbyte or 1-Gbyte page entries. 4-Mbyte pages require two 2-Mbyte entries;
thus, the number of entries available for 4-Mbyte pages is one half the number of 2-Mbyte page
entries.
 
6.2> L1 Data TLB Specifications
The AMD Family 15h processor contains a fully-associative L1 data TLB with 32 entries for 4-
Kbyte, 2-Mbyte, and 1-Gbyte pages. 4-Mbyte pages require two 2-Mbyte entries; thus, the number of
entries available for 4-Mbyte pages is one half the number of 2-Mbyte page entries.
 
6.3> L2 Instruction TLB Specifications
The AMD Family 15 processor contains a 4-way set-associative L2 instruction TLB with 512 4-
Kbyte page entries.
 
6.4> L2 Data TLB Specifications
The AMD Family 15h processor contains an L2 data TLB and page walk cache (PWC) with 1024 4-
Kbyte, 2-Mbyte or 1-Gbyte page entries (8-way set-associative). 4-Mbyte pages require two 2-Mbyte
entries; thus, the number of entries available for 4-Mbyte pages is one half the number of 2-Mbyte
page entries.
 
 
7> Integer Unit
 
The integer unit consists of two components, the integer scheduler, which feeds the integer execution
pipes, and the integer execution unit, which carries out several types of operations discussed below.
The integer unit is duplicated for each thread pair.
 
7.1> Integer Scheduler
The scheduler can receive and schedule up to four micro-ops (μops) in a dispatch group per cycle.
The scheduler tracks operand availability and dependency information as part of its task of issuing
μops to be executed. It also assures that older μops which have been waiting for operands are
executed in a timely manner. The scheduler also manages register mapping and renaming.
 
7.2> Integer Execution Unit
There are four integer execution units per core. Two units which handle all arithmetic, logical and
shift operations (EX). And two which handle address generation and simple ALU operations
(AGLU). Figure 2 shows a block diagram for one integer cluster. There are two such integer clusters
per compute unit.
 
 http://i624.photobucket.com/albums/tt329/vietthanhpro/IEUK15.jpg
 
Macro-ops are broken down into micro-ops in the schedulers. Micro-ops are executed when their
operands are available, either from the register file or result buses. Micro-ops from a single operation
can execute out-of-order. In addition, a particular integer pipe can execute two micro-ops from
different macro-ops (one in the ALU and one in the AGLU) at the same time. (See Figure 1 on
page 32.) The scheduler can receive up to four macro-ops per cycle. This group of macro-ops is
called a dispatch group.
 
EX0 contains a variable latency non-pipelined integer divider. EX1 contains a pipelined integer
multiplier. The AGLUs contain a simple ALU to execute arithmetic and logical operations and
generate effective addresses. A load and store unit (LSU) reads and writes data to and from the L1
data cache. The integer scheduler sends a completion status to the ICU when the outstanding microops
for a given macro-op are executed. (For more information on the LSU, see section 2.12 on page
38.)
 
L1 DTLB has been increased to 64M for AMD Family 15h Models 10h-1fh processors. For
AMD Family 15h models 20h to 2fh processors, the L1 DTLB size has increased from 32 entries to
64 entries.
 
The LZCNT and POPCNT operations are handled in a pipelined unit attached to EX0.
 
 
8> Floating-Point Unit
 
The AMD Family 15h processor floating point unit (FPU) was designed to provide four times the raw
FADD and FMUL bandwidth as the original AMD Opteron and Athlon 64 processors. It achieves this
by means of two 128-bit fused multiply-accumulate (FMAC) units which are supported by a 128-bit
high-bandwidth load-store system. The FPU is a coprocessor model that is shared between the two
cores of one AMD Family 15h compute unit. As such it contains its own scheduler, register files and
renamers and does not share them with the integer units. This decoupling provides optimal
performance of both the integer units and the FPU. In addition to the two FMACs, the FPU also
contains two 128-bit integer units which perform arithmetic and logical operations on AVX, MMX
and SSE packed integer data.
 
A 128-bit integer multiply accumulate (IMAC) unit is incorporated into FPU pipe 0. The IMAC
performs integer fused multiply and accumulate, and similar arithmetic operations on AVX, MMX
and SSE data. A crossbar (XBAR) unit is integrated into FPU pipe 1 to execute the permute
instruction along with shifts, packs/unpacks and shuffles. There is an FPU load-store unit which
supports up to two 128-bit loads and one 128-bit store per cycle.
 
FPU Features Summary and Specifications:
• The FPU can receive up to four ops per cycle. These ops can only be from one thread, but the
   thread may change every cycle. Likewise the FPU is four wide, capable of issue, execution and
   completion of four ops each cycle. Once received by the FPU, ops from multiple threads can be
   executed.
• Within the FPU, up to two loads per cycle can be accepted, possibly from different threads.
• There are four logical pipes: two FMAC and two packed integer. For example, two 128-bit
   FMAC and two 128-bit integer ALU ops can be issued and executed per cycle.
• Two 128-bit FMAC units. Each FMAC supports four single precision or two double-precision ops.
• FADDs and FMULs are implemented within the FMAC's.
• x87 FADDs and FMULs are also handled by the FMAC.
• Each FMAC contains a variable latency divide/square root machine.
• Only 1 256-bit operation can issue per cycle, however an extra cycle can be incurred as in the case
   of a FastPath Double if both micro ops cannot issue together.
 
 http://i624.photobucket.com/albums/tt329/vietthanhpro/FPUK15h.jpg
 http://i624.photobucket.com/albums/tt329/vietthanhpro/FPUE.jpg
 
 
9> Load-Store Unit
 
The AMD family 15h processor load-store (LS) unit handles data accesses. There are two LS units
per compute unit, or one per core. The LS unit supports two 128-bit loads/cycles and one 128-bit
store/cycle. There is a 24 entry store queue. This queue buffers stored data until it can be written to
the data cache. The load queue has 40 entries and holds load operations until after the load has been
completed and delivered to the integer unit or the FPU. The LS unit is composed of two largely independent
pipelines enabling the execution of two memory operations per cycle.
 
Finally, the LS unit helps ensure that the architectural load and store ordering rules are preserved
(a requirement for AMD64 architecture compatibility).
 
http://i624.photobucket.com/albums/tt329/vietthanhpro/LSUK15h.jpg
 
 
10> Write Combining
 
AMD Family 15h processors provide four write-combining data buffers that allow four simultaneous
streams.
 
A Write Coalescing Cache (WCC) has been incorporated into the AMD family 15h
microarchitecture. The WCC is 4 KB in size and is 4-way set associative. Stores to cacheable memory
and, thus, to the L2 cache are coalesced in this cache.
 
 
11> Integrated Memory Controller
 
AMD Family 15h processors provide integrated low-latency, high-bandwidth DDR3 memory
controllers.
 
The memory controller supports:
• DRAM chips that are 4, 8, and 16 bits wide within a DIMM.
• Interleaving memory within DIMMs.
• ECC checking with single symbol correcting and double symbol detecting.
• Dual-independent 64-bit channel operation.
• Optimized scheduling algorithms and access pattern predictors to improve latency and achieved
  bandwidth, particularly for interleaved streams of read and write DRAM accesses.
• A data prefetcher.
  Prefetched data is held in the memory controller itself and is not speculatively filled into the L1, L2,
  or L3 caches. This prefetcher is able to capture both positive and negative stride values (both unit and
  non-unit) of cache-line size, as well as some more complicated access patterns.
  For specifications on a certain processor's memory controller, see the data sheet for that processor.
  For information on how to program the memory controller.

 
 
12> HyperTransport™ Technology Interface
 
Support HT 3.x, HyperTransport Assist.
 
Additional features in the AMD Family 15h HyperTransport implementation may include:
• HyperTransport link bandwidth balancing, allowing multiple HyperTransport links to be teamed
  to carry coherent traffic.
• HyperTransport Link Splitting, which allowing a single 16-bit link to be split into two 8-bit links.
  These features allow for further optimized platform designs that are capable of increasing system
  bandwidth and reducing latency.
 
 
13> AMD Virtualization Optimizations
 
• The advantages of using nested paging instead of shadow paging
• Guest page attribute table (PAT) configuration
• State swapping
• Economizing Interceptions
• Nested page and shadow page size
• TLB control and flushing in shadow pages
• Instruction Fetch for Intercepted (REP) INS instructions
• Sharing IOIO and MSR protection masks
• CPUID
• Time resources
• Paravirtualized resources

n°7877910
Marc
Chasseur de joce & sly
Posté le 25-04-2011 à 19:08:53  profilanswer
 

Tu ne vas pas nous c/c les 358 pages du Software Optimization Guide for AMD Family 15h Processors quand même ?

n°7877912
EpinardsHa​chés
Posté le 25-04-2011 à 19:11:11  profilanswer
 

Wirmish tu pourrais pas abréger et en français stp
 
[:nicoozz]

n°7877918
Gigathlon
Quad-neurones natif
Posté le 25-04-2011 à 19:15:56  profilanswer
 


Pour résumer, ça ne change pas grand chose, la température étant globalement proportionnelle à la conso. Ca peut biaiser, mais pas changer radicalement le profil.

n°7877920
Fouge
Posté le 25-04-2011 à 19:16:14  profilanswer
 

Wirmish a écrit :

AMD Family 15h version 1  AND version 2:

 

Page 20: AMD Family 15h processors have multiple compute units, each containing its own L2 cache and two cores. The cores share their compute unit’s L2 cache. Each core incorporates the complete x86 instruction set logic and L1 data cache. Compute units share the processor’s L3 cache and Northbridge (see Chapter 2, Microarchitecture of AMD Family 15h Processors).
[...]

Tout cela semble être extrait de ce PDF sorti en début de mois :
Software Optimization Guide for AMD Family 15h Processors

 

edit: grillé par Marc :o


Message édité par Fouge le 25-04-2011 à 19:17:01
n°7877928
gliterr
Posté le 25-04-2011 à 19:19:40  profilanswer
 

Wirmish a écrit :

Depuis quand je sais pas trop, mais c'est ce que fait fera le Bulldozer.


Je me suis permis de corriger.

 
MEI a écrit :

Chez Intel ça existe depuis le Nehalem.


Et pourquoi cette donnée n'est pas dispo pour les softs ? C'est un peu dommage non ?
Surtout qu'AMD prétent le contraire justement.

 
Gigathlon a écrit :


Pour résumer, ça ne change pas grand chose, la température étant globalement proportionnelle à la conso. Ca peut biaiser, mais pas changer radicalement le profil.


C'est ma question. Un capteur de température ultra précis, on sait faire.
La conso immédiate, il faut la calculer.


Message édité par gliterr le 25-04-2011 à 19:24:21
n°7877940
Wirmish
¡sıɹdɹns zǝɹǝs snoʌ
Posté le 25-04-2011 à 19:22:44  profilanswer
 

Marc a écrit :

Tu ne vas pas nous c/c les 358 pages du Software Optimization Guide for AMD Family 15h Processors quand même ?


Ça m'a passé par la tête mais je me suis dit que je serai mieux de copier que les bouts intéressants.  :whistle:

mood
Publicité
Posté le 25-04-2011 à 19:22:44  profilanswer
 

n°7877949
gliterr
Posté le 25-04-2011 à 19:26:49  profilanswer
 

Tu aurais pas pus faire ca quand on en a parlé ?
Tu ne pourrais pas mettre que les parties intéressantes ?

n°7877956
Invite_Sur​prise
Racaille de Shanghaï
Posté le 25-04-2011 à 19:29:17  profilanswer
 

Wirmish a écrit :


Ça m'a passé par la tête mais je me suis dit que je serai mieux de copier que les bouts intéressants.  :whistle:


C'est sûr que mettre des pavés sans lien, ça montre ton niveau ... [:implosion du tibia]

n°7877961
Wirmish
¡sıɹdɹns zǝɹǝs snoʌ
Posté le 25-04-2011 à 19:31:36  profilanswer
 

Invite_Surprise a écrit :

C'est sûr que mettre des pavés sans lien, ça montre ton niveau ... [:implosion du tibia]

J'ai une réputation à entretenir.  [:clooney4]  

n°7877963
Activation
21:9 kill Surround Gaming
Posté le 25-04-2011 à 19:33:30  profilanswer
 

EpinardsHachés a écrit :

Wirmish tu pourrais pas abréger et en français stp
 
[:nicoozz]


 
non tu sort pas, c'est à lui de sortir  [:elfenyu]

n°7877964
Fouge
Posté le 25-04-2011 à 19:33:44  profilanswer
 

gliterr a écrit :

Tu aurais pas pus faire ca quand on en a parlé ?
Tu ne pourrais pas mettre que les parties intéressantes ?

Ce n'est qu'un copier/coller de ce post daté du 13 avril...
Bulldozer (V1 and V2) - The (partial) Scoop

n°7877966
Invite_Sur​prise
Racaille de Shanghaï
Posté le 25-04-2011 à 19:34:52  profilanswer
 

Wirmish a écrit :

J'ai une réputation à entretenir.  [:clooney4]  


[:peillon]

n°7877967
Wirmish
¡sıɹdɹns zǝɹǝs snoʌ
Posté le 25-04-2011 à 19:35:29  profilanswer
 

Fouge a écrit :

Ce n'est qu'un copier/coller de ce post daté du 13 avril...
Bulldozer (V1 and V2) - The (partial) Scoop


Ce n'est pas que ça.
J'ai enlevé les trucs inutiles.

Message cité 1 fois
Message édité par Wirmish le 25-04-2011 à 19:35:45
n°7877970
Invite_Sur​prise
Racaille de Shanghaï
Posté le 25-04-2011 à 19:36:14  profilanswer
 

Wirmish a écrit :


Ce n'est pas que ça.
J'ai enlevé les trucs inutiles.


T'es ridicule ...

n°7877975
Wirmish
¡sıɹdɹns zǝɹǝs snoʌ
Posté le 25-04-2011 à 19:39:13  profilanswer
 

Et y'a pas que ce pavé... y'a l'autre.

n°7877978
Fouge
Posté le 25-04-2011 à 19:40:57  profilanswer
 

Wirmish a écrit :

Et y'a pas que ce pavé... y'a l'autre.

L'autre ? Le copier/coller de ce post daté du 10 avril ?
http://amdk11.blog.de/2011/04/10/m [...] -10981756/

n°7877987
Wirmish
¡sıɹdɹns zǝɹǝs snoʌ
Posté le 25-04-2011 à 19:43:20  profilanswer
 

Fouge a écrit :

L'autre ? Le copier/coller de ce post daté du 10 avril ?
http://amdk11.blog.de/2011/04/10/m [...] -10981756/


Tu sais que t'es le Roi de Google toi !  [:sud_conscient:2]  
 
J'ai pas copié-collé bêtement. J'ai amélioré la mise en page.  [:ash ray cure:3]

n°7878000
Marc
Chasseur de joce & sly
Posté le 25-04-2011 à 19:54:33  profilanswer
 

Gigathlon a écrit :


Pour résumer, ça ne change pas grand chose, la température étant globalement proportionnelle à la conso. Ca peut biaiser, mais pas changer radicalement le profil.


 
:heink:
 
Et d'où est-ce que tu sors que la valeur stockée dans le registre des CPU Intel qui donne la conso en joules est basée sur la température ?

n°7878007
Wirmish
¡sıɹdɹns zǝɹǝs snoʌ
Posté le 25-04-2011 à 20:01:49  profilanswer
 

C'est une fonction cachée. [:eneytihi:3]

n°7878046
Profil sup​primé
Posté le 25-04-2011 à 20:33:38  answer
 

La fonction Mulldozer [:degueulasse_gout]  
 [:ouam]  [:onizuka_dark]

n°7878051
Gigathlon
Quad-neurones natif
Posté le 25-04-2011 à 20:41:45  profilanswer
 

Marc a écrit :

Et d'où est-ce que tu sors que la valeur stockée dans le registre des CPU Intel qui donne la conso en joules est basée sur la température ?


Mais est-ce bien la donnée utilisée pour gérer le turbo sauce Intel, et pas plutôt un outil "debug" beaucoup plus lourd?
 
Si il stocke des Joules, c'est probablement que ce registre est rempli au niveau du scheduler d'après les instructions qui en sortent (dans l'hypothèse où ils cherchent la précision, sinon ça peut être au niveau du décodeur), donc une fonction mathématique et non une mesure précise.

Message cité 2 fois
Message édité par Gigathlon le 25-04-2011 à 20:44:43
n°7878075
Marc
Chasseur de joce & sly
Posté le 25-04-2011 à 20:54:42  profilanswer
 

Gigathlon a écrit :


Mais est-ce bien la donnée utilisée pour gérer le turbo sauce Intel, et pas plutôt un outil "debug" beaucoup plus lourd?

 

Quand bien même, d'où tu sors que le Turbo est basé sur la T° et pas sur la conso ? :D

Message cité 2 fois
Message édité par Marc le 25-04-2011 à 21:16:17
n°7878098
Wirmish
¡sıɹdɹns zǝɹǝs snoʌ
Posté le 25-04-2011 à 21:10:01  profilanswer
 

Citation :

Wiki: The increased clock rate is limited by the processor's power, current and thermal limits, as well as the number of active cores and the maximum frequency of the active cores.

Citation :

Intel: Intel Turbo Boost Technology 2.0 automatically allows processor cores to run faster than the base operating frequency if it's operating below power, current, and temperature specification limits.
 
The maximum frequency of Intel Turbo Boost Technology 2.0 is dependent on the number of active cores. The amount of time the processor spends in the Intel Turbo Boost Technology 2.0 state depends on the workload and operating environment.
 
Any of the following can set the upper limit of Intel Turbo Boost Technology 2.0 on a given workload:
· Number of active cores
· Estimated current consumption
· Estimated power consumption
· Processor temperature
 
When the processor is operating below these limits and the user's workload demands additional performance, the processor frequency will dynamically increase until the upper limit of frequency is reached. Intel Turbo Boost Technology 2.0 has multiple algorithms operating in parallel to manage current, power, and temperature to maximize performance and energy efficiency.


Message édité par Wirmish le 25-04-2011 à 21:12:53
n°7878108
canard rou​ge
coin coin
Posté le 25-04-2011 à 21:14:15  profilanswer
 

Wirmish c'est un cyborg amélioré par des produits AMD, pourquoi vous essayez de le battre sur son terrain, ou il est sûr de gagner ? :o
 
Même robocop il n'ose plus l'affronter de front. :o


---------------
Fait's comme les petits canards Et pour que tout l'monde se marre Remuez le popotin En f'sant coin-coin
n°7878109
MEI
|DarthPingoo(tm)|
Posté le 25-04-2011 à 21:15:14  profilanswer
 

Gigathlon a écrit :


Mais est-ce bien la donnée utilisée pour gérer le turbo sauce Intel, et pas plutôt un outil "debug" beaucoup plus lourd?
 
Si il stocke des Joules, c'est probablement que ce registre est rempli au niveau du scheduler d'après les instructions qui en sortent (dans l'hypothèse où ils cherchent la précision, sinon ça peut être au niveau du décodeur), donc une fonction mathématique et non une mesure précise.


Même si c'est une mesure empirique, AMD ne doit pas faire beaucoup mieux hein. C'est pas parce que tu sort une mesure en W au lieu de sortir des Joules qu'elle est plus juste.


---------------
| AMD Ryzen 7 7700X 8C/16T @ 4.5-5.4GHz - 64GB DDR5-6000 30-40-40 1T - AMD Radeon RX 7900 XTX 24GB @ 2680MHz/20Gbps |
n°7878118
Gigathlon
Quad-neurones natif
Posté le 25-04-2011 à 21:20:03  profilanswer
 

Marc a écrit :

Quand bien même, d'où tu sors que le Turbo est basé sur la T° et pas sur la conso ? :D


Je pars du principe que la déclaration "notre turbo il marche avec la conso mesurée, pas la température comme un certain concurrent (que je ne citerai pas et qui détient 90% du marché du CPU puisque VIA ne peut décemment pas être considéré comme un concurrent, de même qu'IBM)" est fondée :o


Message édité par Gigathlon le 25-04-2011 à 21:20:47
n°7878120
Marc
Chasseur de joce & sly
Posté le 25-04-2011 à 21:21:10  profilanswer
 

[:hahaguy] Tu devrais avoir honte :o

 

Le Turbo d'Intel marche avec des valeurs max en terme de conso (W), d'intensité (A) (c'est lié certes) et de température.
Les deux premières valeurs sont modifiables sur les CPU le permettant, pas la dernière.

 

Donc je ne vois pas trop la nouveauté.

Message cité 2 fois
Message édité par Marc le 25-04-2011 à 21:23:06
n°7878133
Wirmish
¡sıɹdɹns zǝɹǝs snoʌ
Posté le 25-04-2011 à 21:31:21  profilanswer
 

canard rouge a écrit :

Wirmish c'est un cyborg amélioré par des produits AMD, pourquoi vous essayez de le battre sur son terrain, ou il est sûr de gagner ? :o
 
Même robocop il n'ose plus l'affronter de front. :o

Robocop c'est moi.
 
http://filesmelt.com/dl/wirmish-Robocop1.jpg
 
 
 

Marc a écrit :

Le Turbo d'Intel marche avec des valeurs max en terme de conso (W), d'intensité (A) (c'est lié certes) et de température... ainsi que du nombre de cores actifs.


Tout cela a été testé en long et en large -> PDF

Message cité 1 fois
Message édité par Wirmish le 25-04-2011 à 21:41:10
n°7878137
barbare128
pas de koi se rouler par terre
Posté le 25-04-2011 à 21:32:55  profilanswer
 

on s'en tape, juste que le turbo, c'est de la daube, c'est juste pour faire kikolol sur les bench de moins de 60s ...
 
mdr quoi


---------------
Feed my back : http://forum.hardware.fr/forum2.ph [...] w=0&nojs=0
n°7878149
Wirmish
¡sıɹdɹns zǝɹǝs snoʌ
Posté le 25-04-2011 à 21:37:28  profilanswer
 

Et le turbo de 1 GHz du Bulldozer c'est aussi de la daube ?  [:ummon]  [:chewee297:3]

n°7878150
Fouge
Posté le 25-04-2011 à 21:38:17  profilanswer
 

barbare128 a écrit :

on s'en tape, juste que le turbo, c'est de la daube, c'est juste pour faire kikolol sur les bench de moins de 60s ...
 
mdr quoi

On verra ce que ça donne sur Bulldozer : passer de 3 à 4GHz, ça fait pas loin de 33%. Et si le refroidissement est correct, si ça marche pendant 10s, ça marchera pendant 10min [:razorbak83]

n°7878153
gliterr
Posté le 25-04-2011 à 21:40:04  profilanswer
 

Marc a écrit :

Quand bien même, d'où tu sors que le Turbo est basé sur la T° et pas sur la conso ? :D


Sur l'article d'AMD sur le turbo chez Intel.

n°7878154
Gigathlon
Quad-neurones natif
Posté le 25-04-2011 à 21:40:23  profilanswer
 

Marc a écrit :

Donc je ne vois pas trop la nouveauté.


Y'en a pas, c'est juste une autre façon de faire la même chose... ou pas :o
 
Selon la façon dont sont réalisées les estimations de conso AMD peut avoir un (maigre) avantage, mais carrément pas de rapport avec ce qui est mis en relation avec dans le texte ('tention, avec vos SB, si le CPU chauffe trop car il fait 40°C vous aurez pu de turbo, mais chez nous pas du tout! <- à lire en pensant à la pub WV tellement c'est absurde).

n°7878160
Wirmish
¡sıɹdɹns zǝɹǝs snoʌ
Posté le 25-04-2011 à 21:42:57  profilanswer
 

L'idée c'est que comme le turbo d'AMD se concentre principalement sur la conso, les overclockers adeptes du LN2 vont s'amuser comme des petits fous... en déactivant le turbo.
 

Spoiler :

[:xolth] ???


Message édité par Wirmish le 25-04-2011 à 21:46:10
n°7878171
MEI
|DarthPingoo(tm)|
Posté le 25-04-2011 à 21:52:17  profilanswer
 

Fouge a écrit :

On verra ce que ça donne sur Bulldozer : passer de 3 à 4GHz, ça fait pas loin de 33%. Et si le refroidissement est correct, si ça marche pendant 10s, ça marchera pendant 10min [:razorbak83]


C'est surtout qu'en pratique même sur un laptop t'arrive a avoir plusieurs minutes en turbo, même avec tout les coeurs.


---------------
| AMD Ryzen 7 7700X 8C/16T @ 4.5-5.4GHz - 64GB DDR5-6000 30-40-40 1T - AMD Radeon RX 7900 XTX 24GB @ 2680MHz/20Gbps |
n°7878219
Yop_Yop_Yo​p
Posté le 25-04-2011 à 22:34:25  profilanswer
 

Bonsoir.
 
Je ne souhaite pas empêcher de rêver, d'ailleurs, rêver à quoi au final ?
Mais ceci est un autre débat.
Z'êtes pas obligé de lire.
 
Si on fait un calcul comme j'en avais vu un dans un message de Wirmish, le test Cinebench est réel si en gros
 
- à 8C/8T le BD testé est à 10 ou 11 GHz
- à 16C/16T le BD testé est à 5 ou 5,5 Ghz
 
Donc c'est probable sur une carte mère bi socket capable d'overcloker (?) vu que des ES tournaient à 4 GHz apparemment. Mais on ne savait pas leur nombre de coeurs je crois....
 
Tous ces calculs sont basés sur l'affirmation "33% de coeur en plus -> 50% perf en plus". (par rapport à la génération précédente)
En perf par clock pour aller vite faut que j'aille dormir.
 
Après si le gain est plus important ça peut être vrai mais clairement pas en 8C/8T.
16C/16T plus plausible.
?
 
Enfin tout ça est juste un jeu pour s'occuper.
Poussière... tout ça.

n°7878237
barbare128
pas de koi se rouler par terre
Posté le 25-04-2011 à 22:53:33  profilanswer
 

Yop_Yop_Yop a écrit :

Bonsoir.

 

Je ne souhaite pas empêcher de rêver, d'ailleurs, rêver à quoi au final ?
Mais ceci est un autre débat.
Z'êtes pas obligé de lire.

 

Si on fait un calcul comme j'en avais vu un dans un message de Wirmish, le test Cinebench est réel si en gros

 

- à 8C/8T le BD testé est à 10 ou 11 GHz
- à 16C/16T le BD testé est à 5 ou 5,5 Ghz

 

Donc c'est probable sur une carte mère bi socket capable d'overcloker (?) vu que des ES tournaient à 4 GHz apparemment. Mais on ne savait pas leur nombre de coeurs je crois....

 

Tous ces calculs sont basés sur l'affirmation "33% de coeur en plus -> 50% perf en plus". (par rapport à la génération précédente)
En perf par clock pour aller vite faut que j'aille dormir.

 

Après si le gain est plus important ça peut être vrai mais clairement pas en 8C/8T.
16C/16T plus plausible.
?

 

Enfin tout ça est juste un jeu pour s'occuper.
Poussière... tout ça.

 

C'est bien beau les calculs hypothétiques
sur des benchs synthétiques.

 


rolf ça rime, c'est de la poésie moderne  :love:

Message cité 1 fois
Message édité par barbare128 le 25-04-2011 à 22:53:40

---------------
Feed my back : http://forum.hardware.fr/forum2.ph [...] w=0&nojs=0
mood
Publicité
Posté le   profilanswer
 

 Page :   1  2  3  4  5  ..  236  237  238  ..  988  989  990  991  992  993

Aller à :
Ajouter une réponse
 

Sujets relatifs
[topic uniq] LanParty MI P55-T36 mini itx[Topic unique] Crucial M225 (Version 64, 128, 256 Go)
AMD Athlon 64 FX-57 overclocké à 3.1ghz s/A8N32-SLI Deluxe+SLI 8800GTX[Topic unique] Gigabyte GA-MA770T-UD3P
Erreur CRC WinRar Config AMD 3 Windows 7 64[Topic Unique] Thermaltake Level 10
Plus de sujets relatifs à : [Topic Unique] Processeurs AMD Bulldozer FX-8100/6100/4100 (32nm)


Copyright © 1997-2025 Groupe LDLC (Signaler un contenu illicite / Données personnelles)