Development of general purpose processors (CISC-> RISC -> ???)

zhu1982lin · Aug 30, 2018

After CISC and RISC, the development of the general processor architecture has basically stalle.

CISC architecture, represented by x86 instruction set, sticks to its territory for compatibility.Except for a few companies, no one has developed it. Its own structure is also developing very slowly.No new company is currently developing a pure CISC chip.

The RISC architecture also faces a big dilemma.Its theoretical is based on “ the statistical results of "80% of the instructions used in the operation of a typical program, accounting for only 20% of the instruction system of a processor.” But it doesn't say the sloution of, "what should I do with 80% of the instructions that are not commonly used in a processor's instruction system?" Only to choice the is made on a case-by-case basis when it comes to chip design.

With the rapid improvement of the wafer technology, it is basically equivalent to the increase of chip area in some aspects. The functions of the processor are no longer as penny-pinching as before. Clearly, the original RISC idea is no longer appropriate(However, pipeline technology developed on the basis of RISC architecture with multiple instructions overlapping execution must be retained). Even some RISC processors include branch prediction capabilities. Is this still a RISC processor if One processor core is not enough, just put more than one processor core on one chip? Thus it can be seen that the current general processor architecture has been far behind the development of wafer technology.

Why a lot of people nowadays call it "general-purpose processor architecture is very mature? Because there is no relatively new design structure, the various functional modules under the current architecture have been studied thoroughly. But is that really the case? What architecture will emerge from the development of CISC to RISC? What you do not know, do not not exist, to maintain a heart of awe.

The rapid development of GPU has made it far behind general-purpose processors in parallel computing. What about general-purpose processors? It looks like the CPU and GPU are glued together, and there's no better way to fuse.How do you naturally integrate GPU for parallel processing of big data into general-purpose processors?

PS: if the above statements have wrong characters, improper words, wrong contents, wrong logic, etc., please correct.

-------------------------------------------------------------------------------------------------------
With all that said, it's clear that I have my own solution for what's missing from today's general-purpose processors, but it's not convenient to put it here. The general processor architecture I designed named as ZISC(Zhu's Instruction Set Computer) (祝氏指令集计算机). It's not up to me to decide whether my solution makes sense, and whether it really belongs to the next generation of general-purpose processor architectures.

----------------------------------- cut-off rule ---------------------------------------------------------

In 2007, I implemented a very simple processor based on ZISC architecture on an FPGA development board, which is a validation model. It’s workable when run through the assembler on it. (A few simple assembler programs, at that time bought a book, according to write a lexical analysis of the program, the assembly into binary code). That processor model is 8-core.

Email: zhu1982lin@gmail.com

Arthur Hanson · Aug 31, 2018

Do you have any opinion on the Automata architecture?

Tanj · Aug 31, 2018

RISC did have a guiding principle which was the assumption the whole CPU ran on a single clock. So the basis for the original '701 RISC at IBM was that if the average speedup due to adding an instruction, over a set of benchmarks, was less than the slowdown in the clock due to the extra size caused by implementing the instruction, then it was not worth having. Of course this idea is difficult to measure exactly, but the guiding principle is easy to grasp.

When it comes to modern chips with more than 800 sq mm and multiple kinds of accelerator in them the principle is quite different. The various parts of the chip may clock independently with a mesh to interconnect them. Accelerators are only turned on part of the time. The locality of clocks addresses the original IBM 701 concern. And with accelerators you can suggest a different rule: did you save power? The chips are power limited. Some accelerators can run 100x lower power for same result as a general purpose CPU, mostly due to organizing data flow to be local, since data fetches consume much more energy than computation. So, if you can offload a task to an efficient accelerator and save 5% of your power, and that took less than 5% of your chip area, you have a candidate for inclusion. If the activity now has lower latency, can be used in more scenarios, or has greater throughput then these are bonuses which may seal the case for including the function. And if that function is not in use 90% of the time, who cares? Your chip is bound to have dark silicon since running all functions all the time would melt it. So long as the function does not waste power when idle, black is the new opportunity.

Compare it to memory, where 99% of your memory is not used at any one time, maybe even over the course of a typical second. Overall, your memory is like accelerators. Very useful for specific results, idle mostly. Get used to it.

zhu1982lin · Sep 2, 2018

In my opinion, all current processors are RISC (X86 internal conversion to risc instructions). The biggest problem facing the current chip is that there is no guiding principle. 7nm, or even smaller process, which led designers to re-enter the designer. The old path of CISC, in order to convert the excess wafer area into faster execution speed.

An efficient system that will coordinate the various components within the chip, just like the factory's pipeline, not idle.

The heat dissipation problem caused by the full speed operation of various components of the CPU is another subject, which is not a problem currently discussed. The memory belongs to the storage unit, and the CPU chip is in two operation modes, and the idle mode of the memory cannot be compared with the CPU.

(Use google translation, if there is a problem, please point out)

Tanj · Sep 6, 2018

The components are not just on the CPU chip, there are accelerators all over the place. The instruction set and the cores have very little to do with coordinating the work, it is much more how the fabric works to connect units, what sort of bandwidth it has, how it connects to IO pins, how do doorbells and interrupts work, is there acceleration for queuing like RDMA or NVMe, is there coherence on the IO, how many sockets can be connected through the mesh extender, are you set up to scale to multiple die in a package, etc. A new core and instruction set may be interesting for a specialized accelerator, but completely miss the point for a general CPU now.

zhu1982lin · Sep 6, 2018

Currently, because of the structure is too old, it is not suitable for the development of the wafer industrial process.It leads to all kinds of accelerators. These accelerators can increase CPU speed, but not much.The return on investment is too low.
My architecture is better than the current general-purpose processor architectures,It can adapt to the current process(7nm),and you can get much faster computing speed than the current chip.

Tanj · Sep 7, 2018

Accelerators do not increase CPU speed. The ones I am talking about are completely separate from the CPU cores, and leverage the same high bandwidth fabric the cores are attached to. The core is only a small part of the modern computer. Its speed is more a consequence of memory bandwidth, latency, and caches than it is about the instruction set, ALU path, or registers. Those "classic" computation things occupy a tiny part of the modern die, and the reason all that other stuff is on the die is because it is important.

zhu1982lin · Sep 8, 2018

I am very sorry, I didn't understand your question, I didn't ask.
My architecture only contains the kernel part of the CPU. IO pins, interrupts, caches, etc. are not included (not without, but the design principle is the same as before).
I have no experience with the design of IO pins, interrupts, caches, etc. If you are asking this question, I am not able to answer you.

(Use google translation)
--------------------------------------------------
非常抱歉,没有看懂你的问题,答非所问.
我的架构只包含 CPU 的内核部分. IO引脚,中断，cache,等等都没有包含在内(并不是没有,而是设计原则和以前一样).
对于 IO引脚,中断，cache,等等部分的设计,我没有经验.如果你是问这方面的问题,我没能力回答你.

Tanj · Sep 8, 2018

You may have no experience of how to design a cache or the communication fabric or those other things, that is ok, but to be an architect and suggest changes you do need to understand how those things are used. Architecture is about balance. Anyone with decent training and tools can define a core that runs fast. But it does not run usefully unless it connects to other parts of the system. If you look at a modern SOC and drill down to see what all the elements are you will typically find the CPU cores are just a few percent. As for power, the energy needed to do a floating point multiply is easily 100x smaller than the energy to move the values from and to memory. So, low power and fast are no longer about the instruction set or the core. Those were important 30 years ago. To have an architecture today you need to focus on how to move data efficiently between stations, how to make accelerators which have pipelines designed so data flows where it is needed on short wires, how to balance a diversity of function units. 30 years ago the CPU core was a "one man band". Now, there are many cores and they are just part of an orchestra of specialization. They might not even be the conductor.

PabloMack · Apr 8, 2020

I think you are right about CISC development having stalled a long time ago. I have been developing a new CISC architecture over the past year. It is the culmination of three decades of experience in programming (including writing a compiler), designing electronics and writing small embedded operating systems. I call it ϕEngine. But it is not just a 64-bit CPU. It's strongest feature is in its unique virtual memory management which is revolutionary. Its code density is very good and it has many RISC-like features including a large set of General Purpose Registers (larger than most RISC processors). Its usage of a given amount of memory and registers is significantly higher than for most other processors. So it can do more with less and that translates to faster and smaller programs. The thing that makes it a CISC is its variable instruction length. This ensures that immediate values and offsets can be any of the sizes needed for any instruction. The operations are very orthogonal. ϕEngine was not developed in isolation but was co-developed along with a parallel programming language (ϕPPL) and with a real-time database-oriented operating system in mind (ϕOS) which will be written in ϕPPL. It also implicitly implements vector and array operations which really can't be done in C/C++ because of their broken pointer syntax. ϕPPL will come closer to eliminating the need for assembly language than has ever been achieved before. It can do a lot more low-level things that popular HLL's can such as catching and using carries and overflows on adds and subtracts. This is made possible because the source code is based on a superset of full Unicode. ϕPPL is also much better suited as the basis for an HDL since its typing system is built on bits, not bytes. But all of the predefined data types are multiples of a byte and ϕEngine takes good advantage of this.

Search

Development of general purpose processors (CISC-> RISC -> ???)

zhu1982lin

New member

Arthur Hanson

Well-known member

Tanj

Well-known member

zhu1982lin

New member

Tanj

Well-known member

zhu1982lin

New member

Tanj

Well-known member

zhu1982lin

New member

Tanj

Well-known member

PabloMack

New member