Revolvy Trivia Quizzes Revolvy Lists Revolvy Topics

Single instruction, multiple threads

Single instruction, multiple thread (SIMT) is an execution model used in parallel computing where single instruction, multiple data (SIMD) is combined with multithreading.

Overview

The processors, say a number p of them, seem to execute many more than p tasks. This is achieved by each processor having multiple "threads" (or "work-items" or "Sequence of SIMD Lane operations"), which execute in lock-step, and are analogous to SIMD lanes.[1]

The SIMT execution model has been implemented on several GPUs and is relevant for general-purpose computing on graphics processing units (GPGPU), e.g. some supercomputers combine CPUs with GPUs.

SIMT was introduced by Nvidia:[2] [3]

Nvidia's Tesla GPU microarchitecture (first available November 8, 2006 as implemented in the "G80" GPU chip) introduced the single-instruction multiple-thread (SIMT) execution model where multiple independent threads execute concurrently using a single instruction.

ATI Technologies (now AMD) released a competing product slightly later on May 14, 2007, the TeraScale 1-based "R600" GPU chip.

As access time of all the widespread RAM types (e.g. DDR SDRAM, GDDR SDRAM, XDR DRAM, etc.) is still relatively low, engineers came up with the idea to hide the latency that inevitably comes with each memory access. Strictly, the latency-hiding is a feature of the zero-overhead scheduling implemented by modern GPUs. This might or might not be considered to be a property of 'SIMT' itself.

SIMT is intended to limit instruction fetching overhead,[4] i.e. the latency that comes with memory access, and is used in modern GPUs (such as those of Nvidia and AMD) in combination with 'latency hiding' to enable high-performance execution despite considerable latency in memory-access operations. This is where the processor is oversubscribed with computation tasks, and is able to quickly switch between tasks when it would otherwise have to wait on memory. This strategy is comparable to multithreading in CPUs (not to be confused with multi-core).[5]

A downside of SIMT execution is the fact that thread-specific control-flow is performed using "masking", leading to poor utilization where a processor's threads follow different control-flow paths. For instance, to handle an IF-ELSE block where various threads of a processor execute different paths, all threads must actually process both paths (as all threads of a processor always execute in lock-step), but masking is used to disable and enable the various threads as appropriate. Masking is avoided when control flow is coherent for the threads of a processor, i.e. they all follow the same path of execution. The masking strategy is what distinguishes SIMT from ordinary SIMD, and has the benefit of inexpensive synchronization between the threads of a processor.[6]

Continue Reading...
Content from Wikipedia Licensed under CC-BY-SA.

Single instruction, multiple threads

topic

Single instruction, multiple thread ( SIMT ) is an execution model used in parallel computing where single instruction, multiple data (SIMD) is combined with multithreading . Overview The processors, say a number p of them, seem to execute many more than p tasks. This is achieved by each processor having multiple "threads" (or "work-items" or "Sequence of SIMD Lane operations"), which execute in lock-step, and are analogous to SIMD lanes . The SIMT execution model has been implemented on several GPUs and is relevant for general-purpose computing on graphics processing units (GPGPU), e.g. some supercomputers combine CPUs with GPUs. SIMT was introduced by Nvidia : Nvidia's Tesla GPU microarchitecture (first available November 8, 2006 as implemented in the "G80" GPU chip) introduced the single-instruction multiple-thread (SIMT) execution model where multiple independent threads execute concurrently using a single instruction. ATI Technologies (now AMD ) released a competing product slightly later on May 14, 2007 ...more...



Thread (computing)

topic

A process with two threads of execution, running on one processor In computer science , a thread of execution is the smallest sequence of programmed instructions that can be managed independently by a scheduler , which is typically a part of the operating system . The implementation of threads and processes differs between operating systems, but in most cases a thread is a component of a process. Multiple threads can exist within one process, executing concurrently and sharing resources such as memory , while different processes do not share these resources. In particular, the threads of a process share its executable code and the values of its variables at any given time. Single vs multiprocessor systems Systems with a single processor generally implement multithreading by time slicing : the central processing unit (CPU) switches between different software threads. This context switching generally happens very often and rapidly enough that users perceive the threads or tasks as running in parallel. On a mul ...more...



Multithreading (computer architecture)

topic

A process with two threads of execution, running on a single processor In computer architecture , multithreading is the ability of a central processing unit (CPU) or a single core in a multi-core processor to execute multiple processes or threads concurrently, appropriately supported by the operating system . This approach differs from multiprocessing , as with multithreading the processes and threads share the resources of a single or multiple cores: the computing units, the CPU caches , and the translation lookaside buffer (TLB). Where multiprocessing systems include multiple complete processing units, multithreading aims to increase utilization of a single core by using thread-level as well as instruction-level parallelism. As the two techniques are complementary, they are sometimes combined in systems with multiple multithreading CPUs and in CPUs with multiple multithreading cores. Overview The multithreading paradigm has become more popular as efforts to further exploit instruction-level parallelism have ...more...



Central processing unit

topic

An Intel 80486DX2 CPU, as seen from above Bottom side of an Intel 80486DX2 , showing its pins A central processing unit ( CPU ) is the electronic circuitry within a computer that carries out the instructions of a computer program by performing the basic arithmetic , logical, control and input/output (I/O) operations specified by the instructions. The computer industry has used the term "central processing unit" at least since the early 1960s. Traditionally, the term "CPU" refers to a processor , more specifically to its processing unit and control unit (CU), distinguishing these core elements of a computer from external components such as main memory and I/O circuitry. The form, design , and implementation of CPUs have changed over the course of their history, but their fundamental operation remains almost unchanged. Principal components of a CPU include the arithmetic logic unit (ALU) that performs arithmetic and logic operations , processor registers that supply operands to the ALU and store the results of ...more...



Flynn's taxonomy

topic

Flynn's taxonomy is a classification of computer architectures , proposed by Michael J. Flynn in 1966. The classification system has stuck, and has been used as a tool in design of modern processors and their functionalities. Since the rise of multiprocessing central processing units (CPUs), a multiprogramming context has evolved as an extension of the classification system. Classifications The four classifications defined by Flynn are based upon the number of concurrent instruction (or control) streams and data streams available in the architecture. Single instruction stream single data stream (SISD) A sequential computer which exploits no parallelism in either the instruction or data streams. Single control unit (CU) fetches single instruction stream (IS) from memory. The CU then generates appropriate control signals to direct single processing element (PE) to operate on single data stream (DS) i.e., one operation at a time. Examples of SISD architecture are the traditional uniprocessor machines like olde ...more...



Simultaneous multithreading

topic

Simultaneous multithreading ( SMT ) is a technique for improving the overall efficiency of superscalar CPUs with hardware multithreading . SMT permits multiple independent threads of execution to better utilize the resources provided by modern processor architectures . Details The name multithreading is ambiguous, because not only can multiple threads be executed simultaneously on one CPU core, but also multiple tasks (with different page tables , different task state segments , different protection rings , different I/O permissions , etc.). Although running on the same core, they are completely separated from each other. Multithreading is similar in concept to preemptive multitasking but is implemented at the thread level of execution in modern superscalar processors. Simultaneous multithreading (SMT) is one of the two main implementations of multithreading, the other form being temporal multithreading (also known as super-threading). In temporal multithreading, only one thread of instructions can execute in ...more...



Barrel processor

topic

A barrel processor is a CPU that switches between threads of execution on every cycle . This CPU design technique is also known as "interleaved" or "fine-grained" temporal multithreading . Unlike simultaneous multithreading in modern superscalar architectures, it generally does not allow execution of multiple instructions in one cycle. Like preemptive multitasking , each thread of execution is assigned its own program counter and other hardware registers (each thread's architectural state ). A barrel processor can guarantee that each thread will execute one instruction every n cycles, unlike a preemptive multitasking machine, that typically runs one thread of execution for hundreds or thousands of cycles, while all other threads wait their turn. A technique called C-slowing can automatically generate a corresponding barrel processor design from a single-tasking processor design. An n-way barrel processor generated this way acts much like n separate multiprocessing copies of the original single-tasking process ...more...



Parallel computing

topic

Parallel computing is a type of computation in which many calculations or the execution of processes are carried out simultaneously. Large problems can often be divided into smaller ones, which can then be solved at the same time. There are several different forms of parallel computing: bit-level , instruction-level , data , and task parallelism . Parallelism has been employed for many years, mainly in high-performance computing , but interest in it has grown lately due to the physical constraints preventing frequency scaling . As power consumption (and consequently heat generation) by computers has become a concern in recent years, parallel computing has become the dominant paradigm in computer architecture , mainly in the form of multi-core processors . Parallel computing is closely related to concurrent computing —they are frequently used together, and often conflated, though the two are distinct: it is possible to have parallelism without concurrency (such as bit-level parallelism ), and concurrency wi ...more...



SPMD

topic

In computing , SPMD ( single program, multiple data ) is a technique employed to achieve parallelism ; it is a subcategory of MIMD . Tasks are split up and run simultaneously on multiple processors with different input in order to obtain results faster. SPMD is the most common style of parallel programming. It is also a prerequisite for research concepts such as active messages and distributed shared memory . SPMD vs SIMD In SPMD, multiple autonomous processors simultaneously execute the same program at independent points, rather than in the lockstep that SIMD imposes on different data. With SPMD, tasks can be executed on general purpose CPUs ; SIMD requires vector processors to manipulate data streams. Note that the two are not mutually exclusive. Distributed memory SPMD usually refers to message passing programming on distributed memory computer architectures. A distributed memory computer consists of a collection of independent computers, called nodes. Each node starts its own program and communicates wit ...more...



Thread block

topic

A thread block is a programming abstraction that represents a group of threads that can be executing serially or in parallel. For better process and data mapping, threads are grouped into thread blocks. The number of threads varies with available shared memory. 'The number of threads in a thread block is also limited by the architecture to a total of 512 threads per block. ' The threads in the same thread block run on the same stream processor. Threads in the same block can communicate with each other via shared memory , barrier synchronization or other synchronization primitives such as atomic operations. Multiple blocks are combined to form a grid. All the blocks in the same grid contain the same number of threads. Since the number of threads in a block is limited to 512, grids can be used for computations that require a large number of thread blocks to operate in parallel. CUDA is a parallel computing platform and programming model that higher level languages can use to exploit parallelism. In CUDA, the ke ...more...



Hyper-threading

topic

In this high-level depiction of HTT, instructions are fetched from RAM (differently colored boxes represent the instructions of four different programs), decoded and reordered by the front end (white boxes represent pipeline bubbles ), and passed to the execution core capable of executing instructions from two different programs during the same clock cycle . Hyper-threading (officially called Hyper-Threading Technology or HT Technology , and abbreviated as HTT or HT ) is Intel's proprietary simultaneous multithreading (SMT) implementation used to improve parallelization of computations (doing multiple tasks at once) performed on x86 microprocessors. It first appeared in February 2002 on Xeon server processors and in November 2002 on Pentium 4 desktop CPUs. Later, Intel included this technology in Itanium , Atom , and Core 'i' Series CPUs, among others. For each processor core that is physically present, the operating system addresses two virtual (logical) cores and shares the workload between them when possi ...more...



Threading (manufacturing)

topic

Threading is the process of creating a screw thread . More screw threads are produced each year than any other machine element . There are many methods of generating threads, including subtractive methods (many kinds of thread cutting and grinding, as detailed below); deformative or transformative methods (rolling and forming; molding and casting); additive methods (such as 3D printing ); or combinations thereof. Overview of methods (comparison, selection, etc.) There are various methods for generating screw threads. The method chosen for any one application is chosen based on constraints—time, money, degree of precision needed (or not needed), what equipment is already available, what equipment purchases could be justified based on resulting unit price of the threaded part (which depends on how many parts are planned), etc. In general, certain thread-generating processes tend to fall along certain portions of the spectrum from toolroom -made parts to mass-produced parts, although there can be considerable o ...more...



Superscalar processor

topic

Simple superscalar pipeline. By fetching and dispatching two instructions at a time, a maximum of two instructions per cycle can be completed. (IF = Instruction Fetch, ID = Instruction Decode, EX = Execute, MEM = Memory access, WB = Register write back, i = Instruction number, t = Clock cycle [i.e., time]) Processor board of a CRAY T3e supercomputer with four superscalar Alpha 21164 processors A superscalar processor is a CPU that implements a form of parallelism called instruction-level parallelism within a single processor. In contrast to a scalar processor that can execute at most one single instruction per clock cycle, a superscalar processor can execute more than one instruction during a clock cycle by simultaneously dispatching multiple instructions to different execution units on the processor. It therefore allows for more throughput (the number of instructions that can be executed in a unit of time) than would otherwise be possible at a given clock rate . Each execution unit is not a separate processo ...more...



Memory barrier

topic

A memory barrier , also known as a membar , memory fence or fence instruction , is a type of barrier instruction that causes a central processing unit (CPU) or compiler to enforce an ordering constraint on memory operations issued before and after the barrier instruction. This typically means that operations issued prior to the barrier are guaranteed to be performed before operations issued after the barrier. Memory barriers are necessary because most modern CPUs employ performance optimizations that can result in out-of-order execution . This reordering of memory operations (loads and stores) normally goes unnoticed within a single thread of execution , but can cause unpredictable behaviour in concurrent programs and device drivers unless carefully controlled. The exact nature of an ordering constraint is hardware dependent and defined by the architecture's memory ordering model . Some architectures provide multiple barriers for enforcing different ordering constraints. Memory barriers are typically used whe ...more...



Program counter

topic

The program counter ( PC ), commonly called the instruction pointer ( IP ) in Intel x86 and Itanium microprocessors , and sometimes called the instruction address register ( IAR ), the instruction counter , or just part of the instruction sequencer, is a processor register that indicates where a computer is in its program sequence. In most processors, the PC is incremented after fetching an instruction , and holds the memory address of (" points to") the next instruction that would be executed. (In a processor where the incrementation precedes the fetch, the PC points to the current instruction being executed.) Processors usually fetch instructions sequentially from memory, but control transfer instructions change the sequence by placing a new value in the PC. These include branches (sometimes called jumps), subroutine calls, and returns . A transfer that is conditional on the truth of some assertion lets the computer follow a different sequence under different conditions. A branch provides that the next i ...more...



Multi-core processor

topic

Diagram of a generic dual-core processor with CPU-local level-1 caches and a shared, on-die level-2 cache. An Intel Core 2 Duo E6750 dual-core processor. An AMD Athlon X2 6400+ dual-core processor. A multi-core processor is a single computing component with two or more independent processing units called cores, which read and execute program instructions . The instructions are ordinary CPU instructions (such as add, move data, and branch) but the single processor can run multiple instructions on separate cores at the same time, increasing overall speed for programs amenable to parallel computing . Manufacturers typically integrate the cores onto a single integrated circuit die (known as a chip multiprocessor or CMP) or onto multiple dies in a single chip package . A multi-core processor implements multiprocessing in a single physical package. Designers may couple cores in a multi-core device tightly or loosely. For example, cores may or may not share caches , and they may implement message passing or shared ...more...



Process (computing)

topic

In computing, a process is an instance of a computer program that is being executed. It contains the program code and its current activity. Depending on the operating system (OS), a process may be made up of multiple threads of execution that execute instructions concurrently . A computer program is a passive collection of instructions , while a process is the actual execution of those instructions. Several processes may be associated with the same program; for example, opening up several instances of the same program often means more than one process is being executed. Multitasking is a method to allow multiple processes to share processors (CPUs) and other system resources. Each CPU (core) executes a single task at a time. However, multitasking allows each processor to switch between tasks that are being executed without having to wait for each task to finish. Depending on the operating system implementation, switches could be performed when tasks perform input/output operations, when a task indicates that ...more...



Critical section

topic

In concurrent programming , concurrent accesses to shared resources can lead to unexpected or erroneous behavior, so parts of the program where the shared resource is accessed are protected. This protected section is the critical section or critical region. It cannot be executed by more than one process. Typically, the critical section accesses a shared resource, such as a data structure , a peripheral device, or a network connection, that would not operate correctly in the context of multiple concurrent accesses. Need for critical sections Different codes or processes may consist of the same variable or other resources that need to be read or written but whose results depend on the order in which the actions occur. For example, if a variable ‘x’ is to be read by process A, and process B has to write to the same variable ‘x’ at the same time, process A might get either the old or new value of ‘x’. Process A: // Process A . . b = x + 5 ; // instruction executes at time = Tx . Process B: // Process B . . x = 3 ...more...



System call

topic

A high-level overview of the Linux kernel's system call interface, which handles communication between its various components and the userspace In computing , a system call is the programmatic way in which a computer program requests a service from the kernel of the operating system it is executed on. This may include hardware-related services (for example, accessing a hard disk drive ), creation and execution of new processes , and communication with integral kernel services such as process scheduling . System calls provide an essential interface between a process and the operating system. In most systems, system calls can only be made from userspace processes, while in some systems, OS/360 and successors for example, privileged system code also issues system calls. Privileges The architecture of most modern processors, with the exception of some embedded systems, involves a security model . For example, the rings model specifies multiple privilege levels under which software may be executed: a program is us ...more...



Linearizability

topic

In concurrent programming , an operation (or set of operations) is atomic , linearizable , indivisible or uninterruptible if it appears to the rest of the system to occur at once without being interrupted. Atomicity is a guarantee of isolation from interrupts , signals , concurrent processes and threads . It is relevant for thread safety and reentrancy . Additionally, atomic operations commonly have a succeed-or-fail definition—they either successfully change the state of the system, or have no apparent effect. In a concurrent system, processes can access a shared object at the same time. Because multiple processes are accessing a single object, there may arise a situation in which while one process is accessing the object, another process changes its contents. This example demonstrates the need for linearizability. In a linearizable system although operations overlap on a shared object, each operation appears to take place instantaneously. Linearizability is a strong correctness condition, which constrains w ...more...



Multiprocessing

topic

Multiprocessing is the use of two or more central processing units (CPUs) within a single computer system . The term also refers to the ability of a system to support more than one processor or the ability to allocate tasks between them. There are many variations on this basic theme, and the definition of multiprocessing can vary with context, mostly as a function of how CPUs are defined ( multiple cores on one die , multiple dies in one package , multiple packages in one system unit , etc.). According to some on-line dictionaries, a multiprocessor is a computer system having two or more processing units (multiple processors) each sharing main memory and peripherals, in order to simultaneously process programs. A 2009 textbook defined multiprocessor system similarly, but noting that the processors may share "some or all of the system’s memory and I/O facilities"; it also gave tightly coupled system as a synonymous term. At the operating system level, multiprocessing is sometimes used to refer to the exec ...more...



Context switch

topic

The process of a Context Switch In computing, a context switch is the process of storing the state of a process or of a thread , so that it can be restored and execution resumed from the same point later. This allows multiple processes to share a single CPU , and is an essential feature of a multitasking operating system . The precise meaning of the phrase “context switch” varies significantly in usage. In a multitasking context, it refers to the process of storing the system state for one task, so that that task can be paused and another task resumed. A context switch can also occur as the result of an interrupt , such as when a task needs to access disk storage , freeing up CPU time for other tasks. Some operating systems also require a context switch to move between user mode and kernel mode tasks. The process of context switching can have a negative impact on system performance, although the size of this effect depends on the nature of the switch being performed. Cost Context switches are usually computat ...more...



Memory-level parallelism

topic

Memory-level parallelism ( MLP ) is a term in computer architecture referring to the ability to have pending multiple memory operations, in particular cache misses or translation lookaside buffer (TLB) misses, at the same time. In a single processor, MLP may be considered a form of instruction-level parallelism (ILP). However, ILP is often conflated with superscalar , the ability to execute more than one instruction at the same time. E.g., a processor such as the Intel Pentium Pro is five-way superscalar, with the ability to start executing five different microinstructions in a given cycle, but it can handle four different cache misses for up to 20 different load microinstructions at any time. It is possible to have a machine that is not superscalar but which nevertheless has high MLP. Arguably a machine that has no ILP, which is not superscalar, which executes one instruction at a time in a non-pipelined manner, but which performs hardware prefetching (not software instruction level prefetching) exhibits MLP ...more...



Streaming SIMD Extensions

topic

In computing , Streaming SIMD Extensions ( SSE ) is an SIMD instruction set extension to the x86 architecture, designed by Intel and introduced in 1999 in their Pentium III series of processors shortly after the appearance of AMD 's 3DNow! . SSE contains 70 new instructions, most of which work on single precision floating point data. SIMD instructions can greatly increase performance when exactly the same operations are to be performed on multiple data objects. Typical applications are digital signal processing and graphics processing . Intel's first IA-32 SIMD effort was the MMX instruction set. MMX had two main problems: it re-used existing floating point registers making the CPU unable to work on both floating point and SIMD data at the same time, and it only worked on integers . SSE floating point instructions operate on a new independent register set (the XMM registers), and it adds a few integer instructions that work on MMX registers. SSE was subsequently expanded by Intel to SSE2 , SSE3 , SSSE3 , and ...more...



Status register

topic

A status register , flag register , or condition code register is a collection of status flag bits for a processor . An example is the FLAGS register of the x86 architecture or flags in a program status word (PSW) register. The status register is a hardware register that contains information about the state of the processor . Individual bits are implicitly or explicitly read and/or written by the machine code instructions executing on the processor. The status register lets an instruction take action contingent on the outcome of a previous instruction. Typically, flags in the status register are modified as effects of arithmetic and bit manipulation operations. For example, a Z bit may be set if the result of the operation is zero and cleared if it is nonzero. Other classes of instructions may also modify the flags to indicate status. For example, a string instruction may do so to indicate whether the instruction terminated because it found a match/mismatch or because it found the end of the string. The flags ...more...



MIPS architecture

topic

MIPS (an acronym for Microprocessor without Interlocked Pipeline Stages) is a reduced instruction set computer (RISC) instruction set architecture (ISA) developed by MIPS Technologies (formerly MIPS Computer Systems). The early MIPS architectures were 32-bit, with 64-bit versions added later. There are multiple versions of MIPS: including MIPS I, II, III, IV, and V; as well as five releases of MIPS32/64 (for 32- and 64-bit implementations, respectively). As of April 2017, the current version is MIPS32/64 Release 6. MIPS32/64 primarily differs from MIPS I–V by defining the privileged kernel mode System Control Coprocessor in addition to the user mode architecture. Several optional extensions are also available, including MIPS-3D which is a simple set of floating-point SIMD instructions dedicated to common 3D tasks, MDMX (MaDMaX) which is a more extensive integer SIMD instruction set using the 64-bit floating-point registers, MIPS16e which adds compression to the instruction stream to make programs take u ...more...



Microarchitecture

topic

Intel Core microarchitecture In computer engineering , microarchitecture , also called computer organization and sometimes abbreviated as µarch or uarch, is the way a given instruction set architecture (ISA), or the ways the PCB is pathed in the Processing unit, is implemented in a particular processor . A given ISA may be implemented with different microarchitectures; implementations may vary due to different goals of a given design or due to shifts in technology. Computer architecture is the combination of microarchitecture and instruction set. Relation to instruction set architecture A microarchitecture organized around a single bus The ISA is roughly the same as the programming model of a processor as seen by an assembly language programmer or compiler writer. The ISA includes the execution model , processor registers , address and data formats among other things. The microarchitecture includes the constituent parts of the processor and how these interconnect and interoperate to implement the ISA. The ...more...



MAJC

topic

MAJC (Microprocessor Architecture for Java Computing) was a Sun Microsystems multi-core, multithreaded, very long instruction word (VLIW) microprocessor design from the mid-to-late 1990s. Originally called the UltraJava processor, the MAJC processor was targeted at running Java programs, whose "late compiling" allowed Sun to make several favourable design decisions. The processor was released into two commercial graphical cards from Sun. Lessons learned regarding multi-threads on a multi-core processor provided a basis for later OpenSPARC implementations such as the UltraSPARC T1 . Design elements Move instruction scheduling to the compiler Like other VLIW designs, notably Intel 's IA-64 (Itanium), MAJC attempted to improve performance by moving several expensive operations out of the processor and into the related compilers. In general, VLIW designs attempt to eliminate the instruction scheduler, which often represents a relatively large amount of the overall processor's transistor budget. With this portion ...more...



Task parallelism

topic

Task parallelism (also known as function parallelism and control parallelism ) is a form of parallelization of computer code across multiple processors in parallel computing environments. Task parallelism focuses on distributing tasks —concurrently performed by processes or threads —across different processors. In contrast to data parallelism which involves running the same task on different components of data, task parallelism is distinguished by running many different tasks at the same time on the same data. A common type of task parallelism is pipelining which consists of moving a single set of data through a series of separate tasks where each task can execute independently of the others. Description In a multiprocessor system, task parallelism is achieved when each processor executes a different thread (or process) on the same or different data. The threads may execute the same or different code. In the general case, different execution threads communicate with one another as they work, but is not a req ...more...



Digital signal processor

topic

A digital signal processor ('DSP ) is a specialized microprocessor (or a SIP block ), with its architecture optimized for the operational needs of digital signal processing . The goal of DSPs is usually to measure, filter or compress continuous real-world analog signals . Most general-purpose microprocessors can also execute digital signal processing algorithms successfully, but dedicated DSPs usually have better power efficiency thus they are more suitable in portable devices such as mobile phones because of power consumption constraints. DSPs often use special memory architectures that are able to fetch multiple data or instructions at the same time. Overview A typical digital processing system Digital signal processing algorithms typically require a large number of mathematical operations to be performed quickly and repeatedly on a series of data samples. Signals (perhaps from audio or video sensors) are constantly converted from analog to digital, manipulated digitally, and then converted back to analog ...more...



Graphics Core Next

topic

Graphics Core Next ( GCN ) is the codename for both a series of microarchitectures as well as for an instruction set . GCN was developed by AMD for their GPUs as the successor to TeraScale microarchitecture/instruction set. The first product featuring GCN was launched in 2011. GCN is a RISC SIMD (or rather SIMT ) microarchitecture contrasting the VLIW SIMD architecture of TeraScale. GCN requires considerably more transistors than TeraScale, but offers advantages for GPGPU computation. It makes the compiler simpler and should also lead to better utilization. GCN is fabricated in 28 nm and 14 nm graphics chips, available on selected models in the Radeon HD 7000 , HD 8000 , 200 , 300 , 400 and 500 series of AMD Radeon graphics cards. GCN is also used in the graphics portion of AMD Accelerated Processing Units (APU), such as in the PlayStation 4 and Xbox One APUs. Instruction set The GCN instruction set is owned by AMD as well as the X86-64 instruction set . The GCN instruction set has been developed specificall ...more...



IA-64

topic

The Intel Itanium architecture IA-64 (also called Intel Itanium architecture ) is the instruction set architecture (ISA) of the Itanium family of 64-bit Intel microprocessors . The basic ISA specification originated at Hewlett-Packard (HP), and was evolved and then implemented in a new processor microarchitecture by Intel with HP's continued partnership and expertise on the underlying EPIC design concepts. In order to establish what was their first new ISA in 20 years and bring an entirely new product line to market, Intel made a massive investment in product definition, design, software development tools, OS, software industry partnerships, and marketing. To support this effort Intel created the largest design team in their history and a new marketing and industry enabling team completely separate from x86. The first Itanium processor, codenamed Merced, was released in 2001. The Itanium architecture is based on explicit instruction-level parallelism , in which the compiler decides which instructions to execu ...more...



Synchronization (computer science)

topic

In computer science , synchronization refers to one of two distinct but related concepts: synchronization of processes , and synchronization of data . Process synchronization refers to the idea that multiple processes are to join up or handshake at a certain point, in order to reach an agreement or commit to a certain sequence of action. Data synchronization refers to the idea of keeping multiple copies of a dataset in coherence with one another, or to maintain data integrity . Process synchronization primitives are commonly used to implement data synchronization. The need for synchronization The need for synchronization does not arise merely in multi-processor systems but for any kind of concurrent processes; even in single processor systems. Mentioned below are some of the main needs for synchronization: Forks and Joins : When a job arrives at a fork point, it is split into N sub-jobs which are then serviced by n tasks. After being serviced, each sub-job waits until all other sub-jobs are done processing. T ...more...



SIMT

topic

The abbreviation SIMT may mean: School of International Management and Technology Stuttgart Institute of Management and Technology Single Instruction Multiple Threads , relates to SIMD (Single Instruction Multiple Data) Saigon Institute of Management and Technology The abbreviation SIMT may mean: School of International Management and Technology Stuttgart Institute of Management and Technology Single Instruction Multiple Threads , relates to SIMD (Single Instruction Multiple Data) Saigon Institute of Management and Technology ...more...



RISC-V

topic

The logo of the RISC-V ISA RISC-V processor prototype, January 2013 RISC-V (pronounced "risk-five") is an open instruction set architecture (ISA) based on established reduced instruction set computing (RISC) principles. In contrast to most ISAs, the RISC-V ISA can be freely used for any purpose, permitting anyone to design , manufacture and sell RISC-V chips and software . While not the first open ISA, it is significant because it is designed to be useful in modern computerized devices such as warehouse-scale cloud computers , high-end mobile phones and the smallest embedded systems . Such uses demand that the designers consider both performance and power efficiency. The instruction set also has a substantial body of supporting software, which fixes a usual weakness of new instruction sets. The project began in 2010 at the University of California, Berkeley , but many contributors are volunteers and industry workers outside the university. The RISC-V ISA has been designed with small, fast, and low-power real- ...more...

Member feedback about RISC-V:

Folder: Risc-V

(mywo)



Computer multitasking

topic

Modern desktop operating systems are capable of handling large numbers of different processes at the same time. This screenshot shows Linux Mint running simultaneously Xfce desktop environment, Firefox , a calculator program, the built-in calendar, Vim , GIMP , and VLC media player . Multitasking capabilities of Microsoft Windows 1.01 released in 1985, here shown running the MS-DOS Executive and Calculator programs In computing , multitasking is a concept of performing multiple tasks (also known as processes ) over a certain period of time by executing them concurrently . New tasks start and interrupt already started ones before they have reached completion, instead of executing the tasks sequentially so each started task needs to reach its end before a new one is started. As a result, a computer executes segments of multiple tasks in an interleaved manner, while the tasks share common processing resources such as central processing units (CPUs) and main memory . Multitasking does not necessarily mean that mu ...more...



Super Harvard Architecture Single-Chip Computer

topic

The Super Harvard Architecture Single-Chip Computer ( SHARC ) is a high performance floating-point and fixed-point DSP from Analog Devices . SHARC is used in a variety of signal processing applications ranging from single-CPU guided artillery shells to 1000-CPU over-the-horizon radar processing computers. The original design dates to about January 1994. SHARC processors are or were used because they have offered good floating-point performance per watt . SHARC processors are typically intended to have a good number of serial links to other SHARC processors nearby, to be used as a low-cost alternative to SMP . Architecture The SHARC is a Harvard architecture word-addressed VLIW processor; it knows nothing of 8-bit or 16-bit values since each address is used to point to a whole 32-bit word, not just an octet . It is thus neither little-endian nor big-endian, though a compiler may use either convention if it implements 64-bit data and/or some way to pack multiple 8-bit or 16-bit values into a single 32-bit word. ...more...



Lock (computer science)

topic

In computer science , a lock or mutex (from mutual exclusion ) is a synchronization mechanism for enforcing limits on access to a resource in an environment where there are many threads of execution . A lock is designed to enforce a mutual exclusion concurrency control policy. Types Generally, locks are advisory locks, where each thread cooperates by acquiring the lock before accessing the corresponding data. Some systems also implement mandatory locks, where attempting unauthorized access to a locked resource will force an exception in the entity attempting to make the access. The simplest type of lock is a binary semaphore . It provides exclusive access to the locked data. Other schemes also provide shared access for reading data. Other widely implemented access modes are exclusive, intend-to-exclude and intend-to-upgrade. Another way to classify locks is by what happens when the lock strategy prevents progress of a thread. Most locking designs block the execution of the thread requesting the lock until it ...more...



Microthread

topic

Microthreads are functions that may run in parallel to gain increased performance in microprocessors . They provide an execution model that uses a few additional instructions in a conventional processor to break code down into fragments that execute simultaneously. Dependencies are managed by making registers in the microprocessors executing the code synchronising, so one microthread will wait for another to produce data. This is a form of dataflow . This model can be applied to an existing instruction set architecture incrementally by providing just five new instructions to implement concurrency controls. A set of microthreads is a static partition of a basic block into concurrently executing fragments, which execute on a single processor and share a microcontext. An iterator over a set provides a dynamic and parametric family of microthreads. Iterators capture loop concurrency and can be scheduled to different processors. An iterator over a set is created dynamically and is called a family of microthreads. ...more...



Latency oriented processor architecture

topic

Latency oriented processor architecture is the microarchitecture of a microprocessor designed to serve a serial computing thread with a low latency. This is typical of most Central Processing Units (CPU) being developed since the 1970s. These architectures, in general, aim to execute as many instructions as possible belonging to a single serial thread, in a given window of time; however, the time to execute a single instruction completely from fetch to retire stages may vary from a few cycles to even a few hundred cycles in some cases. Latency oriented processor architectures are the opposite of throughput-oriented processors which concern themselves more with the total throughput of the system, rather than the service latencies for all individual threads that they work on. Flynn's taxonomy Latency oriented processor architectures would normally fall into the category of SISD classification under flynn's taxonomy. This implies a typical characteristic of latency oriented processor architectures is to execute ...more...



SSE3

topic

SSE3 , Streaming SIMD Extensions 3 , also known by its Intel code name Prescott New Instructions ( PNI ), is the third iteration of the SSE instruction set for the IA-32 (x86) architecture. Intel introduced SSE3 in early 2004 with the Prescott revision of their Pentium 4 CPU. In April 2005, AMD introduced a subset of SSE3 in revision E (Venice and San Diego) of their Athlon 64 CPUs. The earlier SIMD instruction sets on the x86 platform, from oldest to newest, are MMX , 3DNow! (developed by AMD, but not supported by Intel processors), SSE , and SSE2 . SSE3 contains 13 new instructions over SSE2 . Changes The most notable change is the capability to work horizontally in a register, as opposed to the more or less strictly vertical operation of all previous SSE instructions. More specifically, instructions to add and subtract the multiple values stored within a single register have been added. These instructions can be used to speed up the implementation of a number of DSP and 3D operations. There is also a new i ...more...



Pascal (microarchitecture)

topic

Pascal is the codename for a GPU microarchitecture developed by Nvidia , as the successor to the Maxwell architecture. The architecture was first introduced in April 2016 with the release of the Tesla P100 (GP100) on April 5, 2016, and is primarily used in the GeForce 10 series , starting with the GeForce GTX 1080 and GTX 1070 (both using the GP104 GPU), which were released on May 17, 2016 and June 10, 2016 respectively. Pascal is manufactured using the 16nm FinFET process. The architecture is named after the 17th century French mathematician and physicist, Blaise Pascal . Details In March 2014, Nvidia announced that the successor to Maxwell would be the Pascal microarchitecture; announced on 6 May 2016 and released on 27 May 2016. The Tesla P100 (GP100 chip) has a different version of the Pascal architecture compared to the GTX GPUs (GP104 chip). The shader units in GP104 have a Maxwell -like design. Architectural improvements of the GP100 architecture include the following: In Pascal, an SM (streaming multi ...more...



Compare-and-swap

topic

In computer science , compare-and-swap ( CAS ) is an atomic instruction used in multithreading to achieve synchronization . It compares the contents of a memory location with a given value and, only if they are the same, modifies the contents of that memory location to a new given value. This is done as a single atomic operation. The atomicity guarantees that the new value is calculated based on up-to-date information; if the value had been updated by another thread in the meantime, the write would fail. The result of the operation must indicate whether it performed the substitution; this can be done either with a simple boolean response (this variant is often called compare-and-set ), or by returning the value read from the memory location (not the value written to it). Overview A compare-and-set operation is an atomic version of the following pseudocode , where * denotes access through a pointer : function cas(p : pointer to int, old : int, new : int) returns bool { if *p ≠ old { return false ...more...



Qualcomm Hexagon

topic

Hexagon (QDSP6) is the brand for a family of 32-bit multi-threaded microarchitectures implementing the same instruction set for a digital signal processor (DSP) developed by Qualcomm . According to 2012 estimation, Qualcomm shipped 1.2 billion DSP cores inside its system on a chip (SoCs) (average 2.3 DSP core per SoC) in 2011 year, and 1.5 billion cores were planned for 2012, making the QDSP6 the most shipped architecture of DSP ( CEVA had around 1 billion of DSP cores shipped in 2011 with 90% of IP-licenseable DSP market ). The Hexagon architecture is designed to deliver performance with low power over a variety of applications. It has features such as hardware assisted multithreading , privilege levels, Very Long Instruction Word (VLIW) , Single Instruction, Multiple Data (SIMD) , and instructions geared toward efficient signal processing. The CPU is capable of in-order dispatching up to 4 instructions (the packet) to 4 Execution Units every clock. Hardware multithreading is implemented as barrel tempo ...more...



Von Neumann programming languages

topic

A von Neumann language is any of those programming languages that are high-level abstract isomorphic copies of von Neumann architectures . As of 2009, most current programming languages fit into this description, likely as a consequence of the extensive domination of the von Neumann computer architecture during the past 50 years. The differences between Fortran , C , and even Java , although considerable, are ultimately constrained by all three being based on the programming style of the von Neumann computer. If, for example, Java objects were all executed in parallel with asynchronous message passing and attribute-based declarative addressing, then Java would not be in the group. The isomorphism between von Neumann programming languages and architectures is in the following manner: program variables ↔ computer storage cells control statements ↔ computer test-and-jump instructions assignment statements ↔ fetching, storing instructions expressions ↔ memory reference and arithmetic instructions. Criticism John ...more...



OpenMP

topic

OpenMP ( Open Multi-Processing ) is an application programming interface (API) that supports multi-platform shared memory multiprocessing programming in C , C++ , and Fortran , on most platforms, instruction set architectures and operating systems , including Solaris , AIX , HP-UX , Linux , macOS , and Windows . It consists of a set of compiler directives , library routines , and environment variables that influence run-time behavior. OpenMP is managed by the nonprofit technology consortium OpenMP Architecture Review Board (or OpenMP ARB), jointly defined by a group of major computer hardware and software vendors, including AMD , IBM , Intel , Cray , HP , Fujitsu , Nvidia , NEC , Red Hat , Texas Instruments , Oracle Corporation , and more. OpenMP uses a portable , scalable model that gives programmers a simple and flexible interface for developing parallel applications for platforms ranging from the standard desktop computer to the supercomputer . An application built with the hybrid model of parallel progra ...more...



Computer performance

topic

Computer performance is the amount of work accomplished by a computer system. Depending on the context, high computer performance may involve one or more of the following: Short response time for a given piece of work High throughput (rate of processing work) Low utilization of computing resource (s) High availability of the computing system or application Fast (or highly compact) data compression and decompression High bandwidth Short data transmission time Technical and non-technical definitions The performance of any computer system can be evaluated in measurable, technical terms, using one or more of the metrics listed above. This way the performance can be Compared relative to other systems or the same system before/after changes In absolute terms, e.g. for fulfilling a contractual obligation Whilst the above definition relates to a scientific, technical approach, the following definition given by Arnold Allen would be useful for a non-technical audience: The word performance in computer performance mean ...more...



Athlon 64 X2

topic

The Athlon 64 X2 is the first dual-core desktop CPU designed by AMD . It was designed from scratch as native dual-core by using an already multi-CPU enabled Athlon 64 , joining it with another functional core on one die , and connecting both via a shared dual-channel memory controller/north bridge and additional control logic. The initial versions are based on the E-stepping model of the Athlon 64 and, depending on the model, have either 512 or 1024 KB of L2 Cache per core. The Athlon 64 X2 is capable of decoding SSE3 instructions (except those few specific to Intel's architecture). In June 2007, AMD released low-voltage variants of their low-end 65 nm Athlon 64 X2 , named " Athlon X2 ". The Athlon X2 processors feature reduced TDP of 45 W . The name was also used for K10 based budget CPUs with two cores deactivated. Multithreading The primary benefit of dual-core processors (like the Athlon 64 X2) over single-core processors is their ability to process more software threads at the same time. The ability of ...more...



Kepler (microarchitecture)

topic

Kepler is the codename for a GPU microarchitecture developed by Nvidia , first introduced at retail in April 2012, as the successor to the Fermi microarchitecture. Kepler was Nvidia's first microarchitecture to focus on energy efficiency. Most GeForce 600 series , most GeForce 700 series , and some GeForce 800M series GPUs were based on Kepler, all manufactured in 28 nm. Kepler also found use in the GK20A, the GPU component of the Tegra K1 SoC , as well as in the Quadro Kxxx series, the Quadro NVS 510, and Nvidia Tesla computing modules. Kepler was followed by the Maxwell microarchitecture and used alongside Maxwell in the GeForce 700 series and GeForce 800M series . The architecture is named after Johannes Kepler , a German mathematician and key figure in the 17th century scientific revolution . Overview Where the goal of Nvidia’s previous architecture was design focused on increasing performance on compute and tessellation, with Kepler architecture Nvidia targeted their focus on efficiency, programmability ...more...



Fetch-and-add

topic

In computer science , the fetch-and-add CPU instruction (FAA) atomically increments the contents of a memory location by a specified value. That is, fetch-and-add performs the operation in such a way that if this operation is executed by one process in a concurrent system, no other process will ever see an intermediate result. Fetch-and-add can be used to implement concurrency control structures such as mutex locks and semaphores . Overview The motivation for having an atomic fetch-and-add is that operations that appear in programming languages as are not safe in a concurrent system, where multiple processes or threads are running concurrently (either in a multi-processor system, or preemptively scheduled onto some single-core systems). The reason is that such an operation is actually implemented as multiple machine instructions: Fetch the value at the location x , say x , into a register; add a to x in the register; store the new value of the register back into x . When one process is doing x = x + a and ano ...more...




Next Page
Javascript Version
Revolvy Server https://www.revolvy.com