Revolvy Trivia Quizzes Revolvy Lists Revolvy Topics

Single instruction, multiple threads

Single instruction, multiple thread (SIMT) is an execution model used in parallel computing where single instruction, multiple data (SIMD) is combined with multithreading.

Overview

The processors, say a number p of them, seem to execute many more than p tasks. This is achieved by each processor having multiple "threads" (or "work-items" or "Sequence of SIMD Lane operations"), which execute in lock-step, and are analogous to SIMD lanes.[1]

The SIMT execution model has been implemented on several GPUs and is relevant for general-purpose computing on graphics processing units (GPGPU), e.g. some supercomputers combine CPUs with GPUs.

SIMT was introduced by Nvidia:[2] [3]

Nvidia's Tesla GPU microarchitecture (first available November 8, 2006 as implemented in the "G80" GPU chip) introduced the single-instruction multiple-thread (SIMT) execution model where multiple independent threads execute concurrently using a single instruction.

ATI Technologies (now AMD) released a competing product slightly later on May 14, 2007, the TeraScale 1-based "R600" GPU chip.

As access time of all the widespread RAM types (e.g. DDR SDRAM, GDDR SDRAM, XDR DRAM, etc.) is still relatively low, engineers came up with the idea to hide the latency that inevitably comes with each memory access. Strictly, the latency-hiding is a feature of the zero-overhead scheduling implemented by modern GPUs. This might or might not be considered to be a property of 'SIMT' itself.

SIMT is intended to limit instruction fetching overhead,[4] i.e. the latency that comes with memory access, and is used in modern GPUs (such as those of Nvidia and AMD) in combination with 'latency hiding' to enable high-performance execution despite considerable latency in memory-access operations. This is where the processor is oversubscribed with computation tasks, and is able to quickly switch between tasks when it would otherwise have to wait on memory. This strategy is comparable to multithreading in CPUs (not to be confused with multi-core).[5]

A downside of SIMT execution is the fact that thread-specific control-flow is performed using "masking", leading to poor utilization where a processor's threads follow different control-flow paths. For instance, to handle an IF-ELSE block where various threads of a processor execute different paths, all threads must actually process both paths (as all threads of a processor always execute in lock-step), but masking is used to disable and enable the various threads as appropriate. Masking is avoided when control flow is coherent for the threads of a processor, i.e. they all follow the same path of execution. The masking strategy is what distinguishes SIMT from ordinary SIMD, and has the benefit of inexpensive synchronization between the threads of a processor.[6]

Continue Reading...
Content from Wikipedia Licensed under CC-BY-SA.

Register file

topic

A register file is an array of processor registers in a central processing unit (CPU). Modern integrated circuit -based register files are usually implemented by way of fast static RAMs with multiple ports. Such RAMs are distinguished by having dedicated read and write ports, whereas ordinary multiported SRAMs will usually read and write through the same ports. The instruction set architecture of a CPU will almost always define a set of registers which are used to stage data between memory and the functional units on the chip. In simpler CPUs, these architectural registers correspond one-for-one to the entries in a physical register file (PRF) within the CPU. More complicated CPUs use register renaming , so that the mapping of which physical entry stores a particular architectural register changes dynamically during execution. The register file is part of the architecture and visible to the programmer, as opposed to the concept of transparent caches . Register bank switching Register files may be clubbed tog ...more...



Work stealing

topic

In parallel computing , work stealing is a scheduling strategy for multithreaded computer programs. It solves the problem of executing a dynamically multithreaded computation, one that can "spawn" new threads of execution, on a statically multithreaded computer, with a fixed number of processors (or cores ). It does so efficiently both in terms of execution time, memory usage, and inter-processor communication. In a work stealing scheduler, each processor in a computer system has a queue of work items (computational tasks, threads) to perform. Each work item consists of a series of instructions, to be executed sequentially, but in the course of its execution, a work item may also spawn new work items that can feasibly be executed in parallel with its other work. These new items are initially put on the queue of the processor executing the work item. When a processor runs out of work, it looks at the queues of other processors and "steals" their work items. In effect, work stealing distributes the scheduling w ...more...



Montecito (processor)

topic

Montecito is the code-name of a major release of Intel 's Itanium 2 Processor Family (IPF), which implements the Intel Itanium architecture on a dual-core processor. It was officially launched by Intel on July 18, 2006 as the "Dual-Core Intel Itanium 2 processor". According to Intel, Montecito doubles performance versus the previous, single-core Itanium 2 processor, and reduces power consumption by about 20%. [1] It also adds multi-threading capabilities (two threads per core), a greatly expanded cache subsystem (12 MB per core), and silicon support for virtualization. Architectural Features and Attributes Two cores per die 2-way coarse-grained multithreading per core (not simultaneous). Montecito-flavour of multi-threading is dubbed temporal, or TMT. This is also known as switch-on-event multithreading, or SoEMT. The two separate threads do not run simultaneously, but the core switches thread in case of a high latency event, like an L3 cache miss which would otherwise stall execution. By this technique, mult ...more...



XCore XS1-G4

topic

The XS1-G4 is a processor designed by XMOS . It is a 32-bit quad-core processor, where each core runs up to 8 concurrent threads. It was available as of Autumn 2008 running at 400 MHz. Each thread can run at up to 100 MHz; four threads follow each other through the pipeline , resulting in a top speed of 1.6 GIPS for four cores if 16 threads are running. The XS1-G4 is a distributed memory multi core processor, requiring the end user and compiler to deal with data distribution. When more than 4 threads execute, the 400 MIPS of each core is equally distributed over all active threads. This allows the use of extra threads in order to hide latency. Description The XS1-G4 comprises four cores and a switch. Each core has a data path, a memory, and register banks for eight threads. Threads running on different cores can communicate with each other by exchanging messages through the switches. Switches of multiple G4s can be connected to form a larger system. The instruction set supports the notion of a channel , a vir ...more...



Bonnell (microarchitecture)

topic

Bonnell is a CPU microarchitecture used by Intel Atom processors which can execute up to two instructions per cycle. Like many other x86 microprocessors, it translates x86 instructions ( CISC instructions) into simpler internal operations (sometimes referred to as micro-ops , effectively RISC style instructions) prior to execution. The majority of instructions produce one micro-op when translated, with around 4% of instructions used in typical programs producing multiple micro-ops. The number of instructions that produce more than one micro-op is significantly fewer than the P6 and NetBurst microarchitectures . In the Bonnell microarchitecture, internal micro-ops can contain both a memory load and a memory store in connection with an ALU operation, thus being more similar to the x86 level and more powerful than the micro-ops used in previous designs. This enables relatively good performance with only two integer ALUs, and without any instruction reordering , speculative execution or register renaming . The ...more...



4D vector

topic

In computer science , a 4D vector is a 4-component vector data type . Uses include homogeneous coordinates for 3-dimensional space in computer graphics , and red green blue alpha ( RGBA ) values for bitmap images with a color and alpha channel (as such they are widely used in computer graphics). They may also represent quaternions (useful for rotations) although the algebra they define is different. Computer hardware support Some microprocessors have hardware support for 4D vectors with instructions dealing with 4 lane single instruction, multiple data ( SIMD ) instructions, usually with a 128-bit data path and 32-bit floating point fields. Specific instructions (e.g., 4 element dot product ) may facilitate the use of one 128-bit register to represent a 4D vector. For example, in chronological order: Hitachi SH4 , PowerPC VMX128 extension, and Intel x86 SSE4. Some 4-element vector engines (e.g., the PS2 vector units ) went further with the ability to broadcast components as multiply sources, and cross produc ...more...



NetBurst (microarchitecture)

topic

The NetBurst microarchitecture , called P68 inside Intel , was the successor to the P6 microarchitecture in the x86 family of CPUs made by Intel. The first CPU to use this architecture was the Willamette-core Pentium 4, released on November 20, 2000 and the first of the Pentium 4 CPUs; all subsequent Pentium 4 and Pentium D variants have also been based on NetBurst. In mid-2004, Intel released the Foster core, which was also based on NetBurst, thus switching the Xeon CPUs to the new architecture as well. Pentium 4-based Celeron CPUs also use the NetBurst architecture. NetBurst was replaced with the Core microarchitecture , released in July 2006. Technology The NetBurst microarchitecture includes features such as Hyper-Threading, Hyper Pipelined Technology, Rapid Execution Engine and Replay System which are firsts in this particular microarchitecture. Hyper-Threading Hyper-threading is Intel's proprietary simultaneous multithreading (SMT) implementation used to improve parallelization of computations (doing mu ...more...



IBM POWER microprocessors

topic

IBM has a series of high performance microprocessors called POWER followed by a number designating generation, i.e. POWER1, POWER2, POWER3 and so forth up to the latest POWER9. These processors have been used by IBM in their RS/6000 , AS/400 , pSeries , iSeries , System p , System i and Power Systems line of servers and supercomputers . They have also been used in data storage devices by IBM and by other server manufacturers like Bull and Hitachi . The name "POWER" was originally presented as an acronym for "Performance Optimization With Enhanced RISC". The POWERn family of processors were developed in the late 1980s and are still in active development nearly 30 years later. In the beginning, they utilized the POWER instruction set architecture (ISA), but that evolved into PowerPC in later generations and then to Power Architecture . Today, only the naming scheme remains the same; modern POWER processors do not use the POWER ISA. History Early developments The 801 research project In 1974 IBM started a projec ...more...



CPU cache

topic

A CPU cache is a hardware cache used by the central processing unit (CPU) of a computer to reduce the average cost (time or energy) to access data from the main memory . A cache is a smaller, faster memory, closer to a processor core , which stores copies of the data from frequently used main memory locations . Most CPUs have different independent caches, including instruction and data caches , where the data cache is usually organized as a hierarchy of more cache levels (L1, L2, etc.). All modern (fast) CPUs (with few specialized exceptions ) have multiple levels of CPU caches. The first CPUs that used a cache had only one level of cache; unlike later level 1 caches, it was not split into L1d (for data) and L1i (for instructions). Almost all current CPUs with caches have a split L1 cache. They also have L2 caches and, for larger processors, L3 caches as well. The L2 cache is usually not split and acts as a common repository for the already split L1 cache. Every core of a multi-core processor has a dedicated ...more...



Data parallelism

topic

Data parallelism is a form of parallelization across multiple processors in parallel computing environments. It focuses on distributing the data across different nodes, which operate on the data in parallel. It can be applied on regular data structures like arrays and matrices by working on each element in parallel. It contrasts to task parallelism as another form of parallelism. A data parallel job on an array of 'n' elements can be divided equally among all the processors. Let us assume we want to sum all the elements of the given array and the time for a single addition operation is Ta time units. In the case of sequential execution, the time taken by the process will be n*Ta time units as it sums up all the elements of an array. On the other hand, if we execute this job as a data parallel job on 4 processors the time taken would reduce to (n/4)*Ta + merging overhead time units. Parallel execution results in a speedup of 4 over sequential execution. One important thing to note is that the locality of data ...more...



Microcode

topic

Microcode is "a technique that imposes an interpreter between the hardware and the architectural level of a computer". As such, the microcode is a layer of hardware-level instructions that implement higher-level machine code instructions or internal state machine sequencing in many digital processing elements. Microcode is used in general-purpose central processing units , as well as in more specialized processors such as microcontrollers , digital signal processors , channel controllers , disk controllers , network interface controllers , network processors , graphics processing units , and in other hardware. Microcode typically resides in special high-speed memory and translates machine instructions, state machine data or other input into sequences of detailed circuit-level operations. It separates the machine instructions from the underlying electronics so that instructions can be designed and altered more freely. It also facilitates the building of complex multi-step instructions, while reducing the comp ...more...



Very long instruction word

topic

Very long instruction word ( VLIW ) refers to instruction set architectures designed to exploit instruction level parallelism (ILP). Whereas conventional central processing units (CPU, processor) mostly allow programs to specify instructions to execute in sequence only, a VLIW processor allows programs to explicitly specify instructions to execute at the same time, concurrently, in parallel. This design is intended to allow higher performance without the complexity inherent in some other designs. Overview The traditional means to improve performance in processors include dividing instructions into substeps so the instructions can be executed partly at the same time (termed pipelining), dispatching individual instructions to be executed independently, in different parts of the processor ( superscalar architectures), and even executing instructions in an order different from the program ( out-of-order execution ). These methods all complicate hardware (larger circuits, higher cost and energy use) because the pr ...more...



Task (computing)

topic

In computing , a task is a unit of execution or a unit of work. The term is ambiguous; precise alternative terms include process , light-weight process , thread (for execution), step, request , or query (for work). In the adjacent diagram, there are queues of incoming work to do and outgoing completed work, and a thread pool of threads to perform this work. Either the work units themselves or the threads that perform the work can be referred to as "tasks", and these can be referred to respectively as requests/responses/threads, incoming tasks/completed tasks/threads (as illustrated), or requests/responses/tasks. Terminology In the sense of "unit of execution", in some operating systems , a task is synonymous with a process , and in others with a thread . In non-interactive execution ( batch processing ), a task is a unit of execution within a job , with the task itself typically a process. The term " multitasking " primarily refers to the processing sense – multiple tasks executing at the same time – but ha ...more...



SPARC Enterprise

topic

The SPARC Enterprise series is a range of UNIX server computers based on the SPARC V9 architecture. It was co-developed by Sun Microsystems and Fujitsu , and introduced in 2007. They were marketed and sold by Sun Microsystems (later Oracle Corporation , after their acquisition of Sun ), Fujitsu, and Fujitsu Siemens Computers under the common brand of "SPARC Enterprise", superseding Sun's Sun Fire and Fujitsu's PRIMEPOWER server product lines. Since 2010, servers based on new SPARC CMT processors ( SPARC T3 and later) have been branded as Oracle's SPARC T-Series servers. Model range Model Height, RU Max. processors Processor frequency Max. memory Max. disk capacity GA date M3000 2 1× SPARC64 VII or VII+ 2.52, 2.75 GHz (VII) or 2.86 GHz (VII+) 64 GB 4× 2.5-inch SAS October 2008 (2.52 GHz), February 2010 (2.75 GHz), April 2011 (VII+) M4000 6 4× SPARC64 VI or VII or VII+ 2.15 GHz (VI), 2.53 GHz (VII), or 2.66 GHz (VII+) 256 GB 2× 2.5-inch SAS April 2007 (VI), July 2008 (VII), December 2010 (VII+) M5000 10 8× SPAR ...more...



List of Intel microprocessors

topic

This generational list of Intel processors attempts to present all of Intel 's processors from the pioneering 4-bit 4004 (1971) to the present high-end offerings, which include the 64-bit Itanium 2 (2002), Intel Core i9 , and Xeon E3 and E5 series processors (2015). Concise technical data is given for each product. Latest desktop and mobile processors for consumers 8th generation Core/Coffee Lake/Kaby Lake Refresh Desktop Model Price (USD) Cores/Threads Base frequency (GHz) Max turbo frequency (GHz) GPU L3 cache (MB) TDP (W) Socket Release i7-8700K $359 6/12 3.7 4.7 UHD 630 1200 12 95 LGA 1151 Q4 2017 i7-8700 $303 6/12 3.2 4.6 UHD 630 1200 12 65 LGA 1151 Q4 2017 i5-8600K $257 6/6 3.6 4.3 UHD 630 1150 9 95 LGA 1151 Q4 2017 i5-8400 $182 6/6 2.8 4.0 UHD 630 1050 9 65 LGA 1151 Q4 2017 i3-8350K $168 4/4 4.0 N/A UHD 630 1150 8 91 LGA 1151 Q4 2017 i3-8100 $117 4/4 3.6 N/A UHD 630 1100 6 65 LGA 1151 Q4 2017 Mobile Model Price (USD) Cores/Threads Base frequency (GHz) Max turbo frequency (GHz) GPU Maximum GPU clock rat ...more...



Symmetric multiprocessing

topic

Diagram of a symmetric multiprocessing system Symmetric multiprocessing ( SMP ) involves a multiprocessor computer hardware and software architecture where two or more identical processors are connected to a single, shared main memory , have full access to all input and output devices, and are controlled by a single operating system instance that treats all processors equally, reserving none for special purposes. Most multiprocessor systems today use an SMP architecture. In the case of multi-core processors , the SMP architecture applies to the cores, treating them as separate processors. SMP systems are tightly coupled multiprocessor systems with a pool of homogeneous processors running independently of each other. Each processor, executing different programs and working on different sets of data, has the capability of sharing common resources (memory, I/O device, interrupt system and so on) that are connected using a system bus or a crossbar . Design SMP systems have centralized shared memory called main me ...more...



SPARC

topic

A Sun UltraSPARC II microprocessor (1997) SPARC , for Scalable Processor Architecture , is a reduced instruction set computing (RISC) instruction set architecture (ISA) originally developed by Sun Microsystems . Its design was strongly influenced by the experimental Berkeley RISC system developed in the early 1980s. First released in 1987, SPARC was one of the most successful early commercial RISC systems, and its success led to the introduction of similar RISC designs from a number of vendors through the 1980s and 90s. The first implementation of the original 32-bit architecture (SPARC V7) was used in Sun's Sun-4 workstation and server systems, replacing their earlier Sun-3 systems based on the Motorola 68000 series of processors. SPARC V8 added a number of improvements that were part of the SuperSPARC series of processors released in 1992. SPARC V9, released in 1993, introduced a 64-bit architecture and was first released in Sun's UltraSPARC processors in 1995. Later, SPARC processors were used in SMP and C ...more...



UltraSPARC T1

topic

Sun Microsystems ' UltraSPARC T1 microprocessor , known until its 14 November 2005 announcement by its development codename " Niagara ", is a multithreading , multicore CPU . Designed to lower the energy consumption of server computers , the CPU typically uses 72 W of power at 1.4 GHz. Afara Websystems pioneered a radical thread-heavy SPARC design. The company was purchased by Sun, and the intellectual property became the foundation of the CoolThreads line of processors, starting with the T1. The T1 is a new-from-the-ground-up SPARC microprocessor implementation that conforms to the UltraSPARC Architecture 2005 specification and executes the full SPARC V9 instruction set . Sun has produced two previous multicore processors ( UltraSPARC IV and IV+), but UltraSPARC T1 is its first microprocessor that is both multicore and multithreaded. The processor is available with four, six or eight CPU cores, each core able to handle four threads concurrently. Thus the processor is capable of processing up to 32 threads co ...more...



FR-V (microprocessor)

topic

The Fujitsu FR-V (Fujitsu RISC - VLIW ) is one of the very few processors ever able to process both a very long instruction word (VLIW) and vector processor instructions at the same time, increasing throughput with high parallel computing while increasing performance per watt and hardware efficiency. The family was presented in 1999. Its design was influenced by the VPP500/5000 models of the Fujitsu VP / 2000 vector processor supercomputer line. Featuring a 1-8 way very long instruction word (VLIW, Multiple Instruction Multiple Data (MIMD), up to 256 bit) instruction set it additionally uses a 4-way single instruction, multiple data (SIMD) vector processor core. A 32-bit RISC instruction set in the superscalar core is combined with most variants integrating a dual 16-bit media processor also in VLIW and vector architecture. Each processor core is superpipelined as well as 4-unit superscalar . A typical integrated circuit integrates a system on a chip and further multiplies speed by integrating multiple cores ...more...



IBM RS64

topic

The IBM RS64 is a family of microprocessors that were used in the late 1990s in IBM's RS/6000 and AS/400 servers . These microprocessors implement the "Amazon", or "PowerPC-AS", instruction set architecture (ISA). Amazon is a superset of the PowerPC instruction set, with the addition of special features not in the PowerPC specification, mainly derived from POWER2 and the original AS/400 processor, and has been 64-bit from the start. The processors in this family are optimized for commercial workloads (integer performance, large caches, branches) and do not feature the strong floating point performance of the processors in the IBM POWER microprocessors family, its sibling. The RS64 family was phased out soon after the introduction of the POWER4 , which was developed to unite the RS64 and POWER families. History In 1990 the Amazon project was started to create a common architecture that would host both AIX and OS/400 . The AS/400 engineering team at IBM was designing a RISC instruction set to replace the CISC i ...more...



Coroutine

topic

Coroutines are computer-program components that generalize subroutines for non-preemptive multitasking , by allowing multiple entry points for suspending and resuming execution at certain locations. Coroutines are well-suited for implementing familiar program components such as cooperative tasks , exceptions , event loops , iterators , infinite lists and pipes . According to Donald Knuth , Melvin Conway coined the term coroutine in 1958 when he applied it to construction of an assembly program . The first published explanation of the coroutine appeared later, in 1963. Comparison with subroutines Subroutines are special cases of ... coroutines. —  Donald Knuth . When subroutines are invoked, execution begins at the start, and once a subroutine exits, it is finished; an instance of a subroutine only returns once, and does not hold state between invocations. By contrast, coroutines can exit by calling other coroutines, which may later return to the point where they were invoked in the original coroutine; from t ...more...



Glossary of operating systems terms

topic

This page is a glossary of Operating systems terminology. A access token : In Microsoft Windows operating systems , an access token contains the security credentials for a login session and identifies the user , the user's groups, the user's privileges, and, in some cases, a particular application. B binary semaphore : See semaphore. booting : In computing , booting (also known as booting up ) is the initial set of operations that a computer system performs after electrical power to the CPU is switched on or when the computer is reset. On modern general purpose computers, this can take tens of seconds and typically involves performing a power-on self-test , locating and initializing peripheral devices, and then finding, loading and starting the operating system . C cache : In computer science , a cache is a component that transparently stores data so that future requests for that data can be served faster. The data that is stored within a cache might be values that have been computed earlier or duplicates of ...more...



Monitor (synchronization)

topic

In concurrent programming , a monitor is a synchronization construct that allows threads to have both mutual exclusion and the ability to wait (block) for a certain condition to become true. Monitors also have a mechanism for signaling other threads that their condition has been met. A monitor consists of a mutex (lock) object and condition variables . A condition variable is basically a container of threads that are waiting for a certain condition. Monitors provide a mechanism for threads to temporarily give up exclusive access in order to wait for some condition to be met, before regaining exclusive access and resuming their task. Another definition of monitor is a thread-safe class , object , or module that uses wrapped mutual exclusion in order to safely allow access to a method or variable by more than one thread . The defining characteristic of a monitor is that its methods are executed with mutual exclusion : At each point in time, at most one thread may be executing any of its methods . By using one o ...more...



Translation lookaside buffer

topic

A translation lookaside buffer ( TLB ) is a memory cache that is used to reduce the time taken to access a user memory location. It is a part of the chip’s memory-management unit (MMU). The TLB stores the recent translations of virtual memory to physical memory and can be called an address-translation cache. A TLB may reside between the CPU and the CPU cache , between CPU cache and the main memory or between the different levels of the multi-level cache. The majority of desktop, laptop, and server processors include one or more TLBs in the memory-management hardware, and it is nearly always present in any processor that utilizes paged or segmented virtual memory . The TLB is sometimes implemented as content-addressable memory (CAM). The CAM search key is the virtual address, and the search result is a physical address . If the requested address is present in the TLB, the CAM search yields a match quickly and the retrieved physical address can be used to access memory. This is called a TLB hit. If the reques ...more...



VideoCore

topic

VideoCore is a low-power mobile multimedia processor originally developed by Alphamosaic Ltd and now owned by Broadcom . Its two-dimensional DSP architecture makes it flexible and efficient enough to decode (as well as encode) a number of multimedia codecs in software while maintaining low power usage. The semiconductor intellectual property core (SIP core) has been found so far only on Broadcom SoCs. Technical details Multimedia system constraints Mobile multimedia devices require a lot of high-speed video processing, but at low power for long battery life. The ARM processor core has a high IPS per watt figure (and thus dominates the mobile phone market), but requires video acceleration coprocessors and display controllers for a complete system. The amount of data passing between these chips at high speed results in higher power consumption. Specialised co-processors may be optimised for throughput over latency (more cores and data parallelism, but at a lower clock speed), and have instruction-sets and memo ...more...



Microprocessor

topic

A Japanese manufactured HuC6260A microprocessor Microprocessors can be recycled . STM32 microprocessor A microprocessor is a computer processor which incorporates the functions of a computer 's central processing unit (CPU) on a single integrated circuit (IC) , or at most a few integrated circuits. The microprocessor is a multipurpose, clock driven, register based, digital-integrated circuit which accepts binary data as input, processes it according to instructions stored in its memory , and provides results as output. Microprocessors contain both combinational logic and sequential digital logic . Microprocessors operate on numbers and symbols represented in the binary numeral system . The integration of a whole CPU onto a single chip or on a few chips greatly reduced the cost of processing power, increasing efficiency. Integrated circuit processors are produced in large numbers by highly automated processes resulting in a low per unit cost. Single-chip processors increase reliability as there are many fewe ...more...



Oracle VM Server for SPARC

topic

Logical Domains ( LDoms or LDOM ) is the server virtualization and partitioning technology for SPARC V9 processors. It was first released by Sun Microsystems in April 2007. After the Oracle acquisition of Sun in January 2010, the product has been re-branded as Oracle VM Server for SPARC from version 2.0 onwards. Each domain is a full virtual machine with a reconfigurable subset of hardware resources. Domains can be securely live migrated between servers while running. Operating systems running inside Logical Domains can be started, stopped, and rebooted independently. A running domain can be dynamically reconfigured to add or remove CPUs, RAM, or I/O devices without requiring a reboot. Supported hardware SPARC hypervisors run in hyperprivileged execution mode, which was introduced in the sun4v architecture. The sun4v processors released as of October 2015 are the UltraSPARC T1 , T2 , T2+ , T3 , T4 , T5 , M5, M6, M10, and M7. Systems based on UltraSPARC T1 support only Logical Domains versions 1.0-1.2. The ne ...more...



Prime95

topic

Prime95 is the freeware application written by George Woltman that is used by GIMPS , a distributed computing project dedicated to finding new Mersenne prime numbers. More specifically, Prime95 refers to the Windows and macOS versions of the software. MPrime is the Linux command-line interface version of Prime95, to be run in a text terminal or in a terminal emulator window as a remote shell client. It is identical to Prime95 in functionality, except it lacks a graphical user interface . Although most of the GIMPS software's source code is publicly available, it is technically not free software as users must abide by the project's distribution terms if the software is used to discover a prime number with at least 100,000,000 decimal digits and wins the $150,000 bounty offered by the EFF . As such, a user who uses Prime95 to discover a qualifying prime number would not be able to claim the prize directly ($50,000 will go to the person who finds the prime number, $50,000 will go to a mathematics-related charit ...more...



Test-and-set

topic

In computer science , the test-and-set instruction is an instruction used to write 1 (set) to a memory location and return its old value as a single atomic (i.e., non-interruptible) operation. If multiple processes may access the same memory location, and if a process is currently performing a test-and-set, no other process may begin another test-and-set until the first process's test-and-set is finished. A CPU may use a test-and-set instruction offered by another electronic component, such as dual-port RAM ; a CPU itself may also offer a test-and-set instruction. A lock can be built using an atomic test-and-set instruction as follows: function Lock(boolean *lock) { while (test_and_set(lock) == 1); } The calling process obtains the lock if the old value was 0 otherwise while-loop spins waiting to acquire the lock. This is called a spinlock . " Test and Test-and-set " is another example. Maurice Herlihy (1991) proved that test-and-set has a finite consensus number and can solve the wait-free consensus p ...more...



Duff's device

topic

In the C programming language, Duff's device is a way of manually implementing loop unrolling by interleaving two syntactic constructs of C: the do - while loop and a switch statement . Its discovery is credited to Tom Duff in November 1983, when Duff was working for Lucasfilm and used it to speed up a real-time animation program. Loop unrolling attempts to reduce the overhead of conditional branching needed to check whether a loop is done, by executing a batch of loop bodies per iteration. To handle cases where the number of iterations is not divisible by the unrolled-loop increments, a common technique among assembly language programmers is to jump directly into the middle of the unrolled loop body to handle the remainder. Duff implemented this technique in C by using C's case label fall-through feature to jump into the unrolled body. Original version Duff's problem was to copy 16-bit units ("shorts" in most C implementations) from an array into a memory-mapped output register, denoted in C by a pointer . ...more...



Time Stamp Counter

topic

The Time Stamp Counter ( TSC ) is a 64-bit register present on all x86 processors since the Pentium . It counts the number of cycles since reset. The instruction RDTSC returns the TSC in EDX:EAX. In x86-64 mode, RDTSC also clears the higher 32 bits of RAX and RDX . Its opcode is 0F 31 . Pentium competitors such as the Cyrix 6x86 did not always have a TSC and may consider RDTSC an illegal instruction. Cyrix included a Time Stamp Counter in their MII . Use The Time Stamp Counter was once an excellent high-resolution, low-overhead way for a program to get CPU timing information. With the advent of multi-core / hyper-threaded CPUs, systems with multiple CPUs , and hibernating operating systems , the TSC cannot be relied upon to provide accurate results — unless great care is taken to correct the possible flaws: rate of tick and whether all cores (processors) have identical values in their time-keeping registers. There is no promise that the timestamp counters of multiple CPUs on a single motherboard will be synch ...more...



DragonFly BSD

topic

DragonFly BSD is a free and open source Unix-like operating system created as a fork of FreeBSD 4.8. Matthew Dillon, an Amiga developer in the late 1980s and early 1990s and a FreeBSD developer between 1994 and 2003, began work on DragonFly BSD in June 2003 and announced it on the FreeBSD mailing lists on 16 July 2003. Dillon started DragonFly in the belief that the methods and techniques being adopted for threading and symmetric multiprocessing in FreeBSD 5 would lead to poor system performance and cause maintenance difficulties. He sought to correct these suspected problems within the FreeBSD project. Due to ongoing conflicts with other FreeBSD developers over the implementation of his ideas, his ability to directly change the FreeBSD codebase was eventually revoked. Despite this, the DragonFly BSD and FreeBSD projects still work together contributing bug fixes, driver updates, and other system improvements to each other. Intended to be the logical continuation of the FreeBSD 4.x series, DragonFly's deve ...more...



Reentrancy (computing)

topic

In computing , a computer program or subroutine is called reentrant if it can be interrupted in the middle of its execution, and then be safely called again ("re-entered") before its previous invocations complete execution. The interruption could be caused by an internal action such as a jump or call, or by an external action such as an interrupt or signal . Once the reentered invocation completes, the previous invocations will resume correct execution. This definition originates from single-threaded programming environments where the flow of control could be interrupted by an interrupt and transferred to an interrupt service routine (ISR). Any subroutine used by the ISR that could potentially have been executing when the interrupt was triggered should be reentrant. Often, subroutines accessible via the operating system kernel are not reentrant. Hence, interrupt service routines are limited in the actions they can perform; for instance, they are usually restricted from accessing the file system and sometimes ...more...



Optimizing compiler

topic

In computing , an optimizing compiler is a compiler that tries to minimize or maximize some attributes of an executable computer program. The most common requirement is to minimize the time taken to execute a program ; a less common one is to minimize the amount of memory occupied. The growth of portable computers has created a market for minimizing the power consumed by a program. Compiler optimization is generally implemented using a sequence of optimizing transformations, algorithms which take a program and transform it to produce a semantically equivalent output program that uses fewer resources. It has been shown that some code optimization problems are NP-complete , or even undecidable . In practice, factors such as the programmer 's willingness to wait for the compiler to complete its task place upper limits on the optimizations that a compiler implementor might provide. (Optimization is generally a very CPU - and memory-intensive process.) In the past, computer memory limitations were also a major fac ...more...



Computer cluster

topic

Technicians working on a large Linux cluster at the Chemnitz University of Technology , Germany Sun Microsystems Solaris Cluster A computer cluster is a set of loosely or tightly connected computers that work together so that, in many respects, they can be viewed as a single system. Unlike grid computers , computer clusters have each node set to perform the same task, controlled and scheduled by software. The components of a cluster are usually connected to each other through fast local area networks , with each node (computer used as a server) running its own instance of an operating system . In most circumstances, all of the nodes use the same hardware and the same operating system, although in some setups (e.g. using Open Source Cluster Application Resources (OSCAR)), different operating systems can be used on each computer, or different hardware. Clusters are usually deployed to improve performance and availability over that of a single computer, while typically being much more cost-effective than single ...more...



EKA2

topic

EKA2 (EPOC Kernel Architecture 2) is the second-generation Symbian platform kernel . Like its predecessor, EKA1 , it has pre-emptive multithreading and full memory protection. The main differences are: Real-time guarantees (each API call is quick, but more importantly, time-bound) Multiple threads inside the kernel as well as outside Pluggable memory models , allowing better support for later generations of ARM instruction set . A "nanokernel" which provides the most basic OS facilities upon which other "personality layers" can be built The user-side interface of EKA2 is almost completely compatible with EKA1 -- though EKA1 has not been used since Symbian OS v8.1 (which was superseded in 2005). The main advantage of EKA2 was its ability to run full telephone signalling stacks . Previously, on Symbian phones, these had to run on a separate CPU. Such signalling stacks are extremely complex and rewriting them to work natively on Symbian OS is typically not an option. EKA2 therefore allows "personality layers" to ...more...



Java ConcurrentMap

topic

The Java programming language's Java Collections Framework version 1.5 and later defines and implements the original regular single-threaded Maps, and also new thread-safe Maps implementing the java.util.ConcurrentMap interface among other concurrent interfaces. In Java 1.6, the java.util.NavigableMap interface was added, extending java.util.SortedMap , and the java.util.ConcurrentNavigableMap interface was added as a subinterface combination. Java Map Interfaces The version 1.8 Map interface diagram has the shape below. Sets can be considered sub-cases of corresponding Maps in which the values are always a particular constant which can be ignored, although the Set API uses corresponding but differently named methods. At the bottom is the java.util.concurrent.ConcurrentNavigableMap, which is a multiple-inheritance. java.util.Collection java.util.Map java.util.SortedMap java.util.NavigableMap java.util.concurrent.ConcurrentNavigableMap java.util.concurrent.ConcurrentMap java.util.concurrent.ConcurrentNavigable ...more...



X86 memory segmentation

topic

x86 memory segmentation refers to the implementation of memory segmentation in the Intel x86 computer instruction set architecture . Segmentation was introduced on the Intel 8086 in 1978 as a way to allow programs to address more than 64 KB (65,536  bytes ) of memory. The Intel 80286 introduced a second version of segmentation in 1982 that added support for virtual memory and memory protection . At this point the original model was renamed real mode , and the new version was named protected mode . The x86-64 architecture, introduced in 2003, has largely dropped support for segmentation in 64-bit mode. In both real and protected modes, the system uses 16-bit segment registers to derive the actual memory address. In real mode, the registers CS, DS, SS, and ES point to the currently used program code segment (CS), the current data segment (DS), the current stack segment (SS), and one extra segment determined by the programmer (ES). The Intel 80386 , introduced in 1985, adds two additional segment registers, FS a ...more...



Coffee Lake

topic

Coffee Lake is Intel's codename for the second 14 nm process refinement following both Skylake and Kaby Lake . Coffee Lake is rumored to consist of 15/28-watt quad-core U-chips with GT3e or GT2 graphics and 35–45-watt H-series chips with GT3e and up to six cores. The integrated graphics on these Coffee Lake chips allow support for DP 1.2 to HDMI 2.0 and HDCP 2.2 connectivity. Coffee Lake natively supports DDR4-2666 MHz memory in dual channel mode. Desktop Coffee Lake CPUs introduce a major change in Intel's Core CPUs nomenclature, in that i5 and i7 CPUs feature six cores (along with hyper-threading in the case of the latter). i3 CPUs, having four cores and dropping hyper-threading for the first time, received a change as well. The chips were released on October 5, 2017. Coffee Lake will be used in conjunction with the 300-series chipset, and will not work with the 100- and 200-series chipsets. Although desktop Coffee Lake processors use the same physical LGA 1151 socket as Skylake and Kaby Lake, the pinout ...more...



Arithmetic logic unit

topic

An arithmetic logic unit ( ALU ) is a combinational digital electronic circuit that performs arithmetic and bitwise operations on integer binary numbers . This is in contrast to a floating-point unit (FPU), which operates on floating point numbers. An ALU is a fundamental building block of many types of computing circuits, including the central processing unit (CPU) of computers, FPUs, and graphics processing units (GPUs). A single CPU, FPU or GPU may contain multiple ALUs. The inputs to an ALU are the data to be operated on, called operands , and a code indicating the operation to be performed; the ALU's output is the result of the performed operation. In many designs, the ALU also has status inputs or outputs, or both, which convey information about a previous operation or the current operation, respectively, between the ALU and external status registers . Signals An ALU has a variety of input and output nets , which are the electrical conductors used to convey digital signals between the ALU and external c ...more...



Jacquard loom

topic

This portrait of Jacquard was woven in silk on a Jacquard loom and required 24,000 punched cards to create (1839). It was only produced to order. Charles Babbage owned one of these portraits; it inspired him in using perforated cards in his analytical engine . It is in the collection of the Science Museum in London, England. Jacquard looms in the Textile Department of the Strzemiński Academy of Fine Arts in Łódź , Poland . Weaving on a jacquard loom with a flying shuttle at the Textile Department of the Strzemiński Academy of Fine Arts in Łódź, Poland. The Jacquard machine ( French:  ) is a device fitted to a power loom that simplifies the process of manufacturing textiles with such complex patterns as brocade , damask and matelassé . It was invented by Joseph Marie Jacquard in 1804. The loom was controlled by a "chain of cards"; a number of punched cards laced together into a continuous sequence. Multiple rows of holes were punched on each card, with one complete card corresponding to one row of the desi ...more...



Calling convention

topic

In computer science , a calling convention is an implementation-level (low-level) scheme for how subroutines receive parameters from their caller and how they return a result. Differences in various implementations include where parameters, return values , return addresses and scope links are placed, and how the tasks of preparing for a function call and restoring the environment afterward are divided between the caller and the callee. Calling conventions may be related to a particular programming language's evaluation strategy but most often are not considered part of it (or vice versa), as the evaluation strategy is usually defined on a higher abstraction level and seen as a part of the language rather than as a low-level implementation detail of a particular language's compiler . Variations Calling conventions may differ in: Where parameters, return values and return addresses are placed (in registers , on the call stack , a mix of both, or in other memory structures) The order in which actual arguments fo ...more...



List of Macintosh models grouped by CPU type

topic

This list of Macintosh models grouped by CPU type contains all CPUs used by Apple Inc. for their Macintosh computers. It is grouped by processor family, processor model, and then chronologically by Macintosh model. Motorola 68000 Motorola 68000 A Motorola 68000 processor using DIPP packaging, as the early Macintosh models used The Motorola 68000 was the first Apple Macintosh processor. It had 32-bit CPU registers , a 24-bit address bus , and a 16-bit data path ; Motorola referred to it as a "16-/32-bit microprocessor". Processor Model Clock speed (MHz) FSB speed ( MT /s) L1 cache (bytes) Introduced Discontinued MC68000 Lisa 5 5 — January 1983 January 1984 Lisa 2 5 5 — January 1984 January 1985 Macintosh 8 8 — January 1984 October 1985 Macintosh 512K 8 8 — September 1984 April 1986 Macintosh XL 5 5 — January 1985 April 1985 Macintosh Plus 8 8 — January 1986 October 1990 Macintosh 512Ke 8 8 — April 1986 September 1987 Macintosh SE 8 8 — March 1987 August 1989 Macintosh SE FDHD 8 8 — August 1989 October 1990 Mac ...more...



JTAG

topic

The Joint Test Action Group ( JTAG ) is an electronics industry association formed in 1985 for developing a method of verifying designs and testing printed circuit boards after manufacture. In 1990 the Institute of Electrical and Electronics Engineers codified the results of the effort in IEEE Standard 1149.1-1990, entitled Standard Test Access Port and Boundary-Scan Architecture. JTAG implements standards for on-chip instrumentation in electronic design automation (EDA) as a complementary tool to digital simulation . It specifies the use of a dedicated debug port implementing a serial communications interface for low-overhead access without requiring direct external access to the system address and data buses. The interface connects to an on-chip test access port (TAP) that implements a stateful protocol to access a set of test registers that present chip logic levels and device capabilities of various parts. The JTAG standards have been extended by many semiconductor chip manufacturers with specialized var ...more...



Real-time operating system

topic

A real-time operating system ( RTOS ) is an operating system (OS) intended to serve real-time applications that process data as it comes in, typically without buffer delays. Processing time requirements (including any OS delay) are measured in tenths of seconds or shorter increments of time. A real time system is a time bound system which has well defined fixed time constraints. Processing must be done within the defined constraints or the system will fail. They either are event driven or time sharing. Event driven systems switch between tasks based on their priorities while time sharing systems switch the task based on clock interrupts. A key characteristic of an RTOS is the level of its consistency concerning the amount of time it takes to accept and complete an application's task ; the variability is jitter . A hard real-time operating system has less jitter than a soft real-time operating system. The chief design goal is not high throughput , but rather a guarantee of a soft or hard performance category. ...more...



Concurrent computing

topic

Concurrent computing is a form of computing in which several computations are executed during overlapping time periods— concurrently —instead of sequentially (one completing before the next starts). This is a property of a system—this may be an individual program , a computer , or a network —and there is a separate execution point or "thread of control" for each computation ("process"). A concurrent system is one where a computation can advance without waiting for all other computations to complete. As a programming paradigm , concurrent computing is a form of modular programming , namely factoring an overall computation into subcomputations that may be executed concurrently. Pioneers in the field of concurrent computing include Edsger Dijkstra , Per Brinch Hansen , and C.A.R. Hoare . Introduction The concept of concurrent computing is frequently confused with the related but distinct concept of parallel computing , although both can be described as "multiple processes executing during the same period of ti ...more...



Framewave

topic

Framewave (formerly AMD Performance Library ( APL )) is computer software , a high-performance optimized programming library , consisting of low level application programming interfaces (APIs) for image processing , signal processing , JPEG , and video functions. These APIs are programmed with task level parallelization ( multi-threading ) and instruction-level parallelism single instruction, multiple data ( SIMD ) for maximum performance on multi-core processors from Advanced Micro Devices (AMD). Framewave is free and open-source software released under the Apache License version 2.0, which is compatible with the GNU General Public License (GPL) 3.0. Overview The AMD Performance Library was developed by Advanced Micro Devices (AMD) as a collection of popular software routines designed to accelerate application development, debugging, and optimization on x86 class processors. It includes simple arithmetic routines, and more complex functions for applications such as image and signal processing. APL is availab ...more...



Semaphore (programming)

topic

In computer science , a semaphore is a variable or abstract data type used to control access to a common resource by multiple processes in a concurrent system such as a multiprogramming operating system. A trivial semaphore is a plain variable that is changed (for example, incremented or decremented, or toggled) depending on programmer-defined conditions. The variable is then used as a condition to control access to some system resource. A useful way to think of a semaphore as used in the real-world systems is as a record of how many units of a particular resource are available, coupled with operations to adjust that record safely (i.e. to avoid race conditions ) as units are required or become free, and, if necessary, wait until a unit of the resource becomes available. Semaphores are a useful tool in the prevention of race conditions; however, their use is by no means a guarantee that a program is free from these problems. Semaphores which allow an arbitrary resource count are called counting semaphores , w ...more...



Transputer

topic

T414 transputer chip IMSB008 base platform with IMSB419 and IMSB404 modules mounted The transputer is a series of pioneering microprocessors from the 1980s, featuring integrated memory and serial communication links, intended for parallel computing . It was designed and produced by Inmos , a semiconductor company based in Bristol , United Kingdom . For some time in the late 1980s, many considered the transputer to be the next great design for the future of computing. While Inmos and the transputer did not ultimately live up to this expectation, the transputer architecture was highly influential in provoking new ideas in computer architecture , several of which have re-emerged in different forms in modern systems. Background In the early 1980s, conventional CPUs appeared to reach a performance limit. Up to that time, manufacturing difficulties limited the amount of circuitry that designers could place on a chip. Continued improvements in the fabrication process, however, removed this restriction. Soon the pro ...more...



POWER7

topic

POWER7 is a family of superscalar symmetric multiprocessors based on the Power Architecture released in 2010 that succeeded the POWER6 . POWER7 was developed by IBM at several sites including IBM's Rochester, MN ; Austin, TX; Essex Junction, VT ; T. J. Watson Research Center , NY; Bromont, QC and IBM Deutschland Research & Development GmbH, Böblingen , Germany laboratories. IBM announced servers based on POWER7 on 8 February 2010. History IBM won a $244 million DARPA contract in November 2006 to develop a petascale supercomputer architecture before the end of 2010 in the HPCS project. The contract also states that the architecture shall be available commercially. IBM's proposal, PERCS (Productive, Easy-to-use, Reliable Computer System), which won them the contract, is based on the POWER7 processor, AIX operating system and General Parallel File System . One feature that IBM and DARPA collaborated on is modifying the addressing and page table hardware to support global shared memory space for POWER7 clust ...more...




Next Page
Javascript Version
Revolvy Server https://www.revolvy.com