CEVA Dolphin Weninar SemiWiki 800x100 260419 (1)

Android Auto-Rooting Malware – You Can Run But You Can’t Hide

Android Auto-Rooting Malware – You Can Run But You Can’t Hide
by Bernard Murphy on 07-14-2016 at 7:00 am

There has been a startling rise in a class of Android auto-rooting malware which is believed to affect over a quarter of a million phones in the US and well over a million in each of India and China. The attack has primarily infected older versions of Android (so far) – KitKat, JellyBean and Lolipop primarily.

The malware, known as Shedun or HummingBad, is believed to be produced by Chinese mobile ad-server company Yingmob and primarily installs fraudulent apps and serves malicious ads. Yingmob today generates healthy revenue purely from these services but having root access to millions of Android devices obviously allows them to expand into even more malicious services in support of cyber-criminals, state actors and others.

The malware seems to start, at least in some cases, through drive-by download. You visit a website (porn websites are apparently notorious for this) from which the software installs without you having to accept any download. Once downloaded, the exploit gains root access to the host phone and installs itself as system software.

The exploit is quite sophisticated in its installation and is nearly impossible to remove. Among other things it updates recovery information so that even if you do a recovery on the phone, it will be recovered along with other software. It seems that the only cures to removing this exploit are to either reflash the ROM or buy a new phone. Users are advised to live a virtuous life (stay away from porn sites) and to bar all downloads from outside Google Play; that action alone apparently reduces success rates in general for Android malware.

You can read more HERE.

More articles by Bernard…


Time-saving modules expand Prototype Ready family

Time-saving modules expand Prototype Ready family
by Don Dingee on 07-13-2016 at 4:00 pm

A big advantage of FPGA-based prototyping is the ability to run real-world I/O at-speed, significantly faster and more accurately than hardware emulation systems typically requiring a protocol adapter. Dealing with real-world I/O means more thorough verification of SoC integration, and the opportunity to optimize systems pre-silicon. Continue reading “Time-saving modules expand Prototype Ready family”


CEVA Launch CDNN2, Embedded Ready Neural Network

CEVA Launch CDNN2, Embedded Ready Neural Network
by Eric Esteve on 07-13-2016 at 12:00 pm

Convolutional Neural Network (CNN) recognition algorithm is generating very high interest in the semi industry. At first, because CNN provides best recognition quality when compared with alternative recognition algorithms. CEVA Deep Neural Network (CDNN) software framework, implemented with CEVA-XM4 imaging & vision DSP, accelerates machine learning deployment for embedded systems. Neural Networks algorithms, used for any cognitive processing, including visual or audio, are very similar to human brain processing and really deserve Artificial Intelligence (AI) denomination.

In fact, networks develop over time, as data collected and analyzed and through training phase, as convolutional networks are learning new object types from examples. CNN algorithm selected by CEVA is a Deep Learning Neural Network, a family of neural network methods using high number of layers (hence deep), focusing on feature representations. CEVA is offering improved capabilities and performance for latest network topologies and layers with CDNN2, including support for Caffe and TensorFlow, Google’s software library for machine learning.

To support the emergent application like surveillance or ADAS, the embedded system should be capable of running deep learning-based video analytics directly on any camera-enabled device, in real time. This capability is offered by CDNN2 algorithm running with CEVA-XM4 DSP, enabling real-time classification. CDNN2 supports any given layer in any network topology, on any resolution, as trained by Caffe and TensorFlow. CDNN2 is also able to support the most demanding machine learning networks, from pre-trained network to embedded system, including GoogLeNet, VGG, SegNet, Alexnet, ResNet and Network-in-network (NIN).

The development process for implementing machine learning in embedded systems is interactive, involving offline training and CEVA Network Generator and enabling real-time classification with pre-trained networks. Network Generator is push button, converting pre-trained networks to real-time optimized. The process:

1. Receives network model & weights as input from offline training (via “Caffe” or “TensoFlow”)
2. Automatically converted into a real-time network model, via CEVA Network Generator
3. Utilizes real-time network model in CNN applications on CEVA-XM4
CEVA Network Generator is running offline and will convert the network information (model and weight) into a real-time network model. It will optimize conversion for power efficiency, generates a fixed point from a floating point model and adapts for embedded constraints. The Network Generator keeps high accuracy as the conversion result shows less than 1% deviation.

Deliverables include real-time example models for image classification, localization or object detection. The (real-time) neural network libraries have been optimized for CEVA-XM4 vision DSP, supporting various network structures and layers, accepting fixed or variable input sizes.

CDNN2 becomes industry’s first software framework for embedded systems to automatically support networks generated by TensorFlow™. Combined with CEVA-XM4 imaging and vision processor, CDNN2 offers highly power-efficient deep learning solution for any camera-enabled device and significant time-to-market advantages for implementing machine learning in embedded systems. Compared to a leading GPU-base system, CEVA solution significantly improves on power consumption and memory bandwidth.

In CDNN, the last “N” is for Network, let’s take a look at the various Convolutional Deep Neural Network supported by CEVA. This is a partial list, as additional proprietary networks are also supported:

  • AlexNet, linear topology, 24 layers and 224×224 RoI
  • SegNet, Multiple-input-Multiple-output topology , 90 layers, 480×360 RoI
  • GoogLeNet, Multiple-input (concatenation layer) Multiple layers per level topology, 23 layers + 9 inceptions, 220×220 RoI
  • VGG-19, linear topology, 19 layers, 224×224 RoI

Certain of these acronyms deserve some explanation. RoI stands for Region of Interest. Searching the web, I have found this clarification: “An input image and multiple regions of interest (RoIs) are input into a fully convolutional network. Each RoI is pooled into a fixed-size feature map and then mapped to a feature vector by fully connected layers (FCs). The network has two output vectors per RoI: softmax probabilities and per-class bounding-box regression offsets. The architecture is trained end-to-end with a multi-task loss.”

This means for AlexNet a fixed-size feature map equal to 224×224 pixels.

Inception is found in GoogLeNet description. Inception means “a Network in a Network in a Network…”, illustrated below:

GoogLeNet is the only CDNN integrating inception and CEVA claims to be the first DSP IP vendor supporting GoogLeNet network. Considering the very high interest for CNN algorithms in the industry, in particular in Automotive (ADAS), no doubt that CDNN2 framework associated with a low-power DSP like CEVA-XM4 will see high adoption in the near future in various applications like smartphones, surveillance, Augmented Reality (AR)/Virtual Reality (VR), drones and obviously ADAS.

You will find a complete description of CDNN2, including pdf presentation and video here

By Eric Esteve from IPNEST


Data Analytics alone cannot deliver effective automation solutions for the industrial IOT

Data Analytics alone cannot deliver effective automation solutions for the industrial IOT
by Akeel Attar on 07-13-2016 at 7:00 am

Automated analytics (which can also be referred to as machine learning, deep learning etc.) are currently attracting the lion’s share of interest from investors, consultants, journalists and executives looking at technologies that can deliver the business opportunities being afforded by the Internet of Things. The reason for this surge in interest is that the IOT generates huge volumes of data from which analytics can discover patterns, anomalies and insights which can then be used to automate, improve and control business operations.

One of the main attractions of automated analytics appears to be the perception that it represents an automated process that is able to learn automatically from data without the need to do any programming of rules. Furthermore, it is perceived that the IOT will allow organisations to apply analytics to data being generated by any physical asset or business process and thereafter being able to use automated analytics to monitor asset performance, detect anomalies and generate problem resolution / trouble-shooting advice; all without any programming of rules!

In reality, automated analytics is a powerful technology for turning data into actionable insight / knowledge and thereby represents a key enabling technology for automation in Industrial IOT. However, automated analytics alone cannot deliver complete solutions for the following reasons:

i- In order for analytics to learn effectively it needs data that spans the spectrum of normal, sub normal and anomalous asset/process behaviour. Such data can become available relatively quickly in a scenario where there are tens or hundreds of thousands of similar assets (central heating boilers, mobile phones etc.). However, this is not the case for more complex equipment / plants / processes where the volume of available faults or anomalous behaviour data is simply not large enough to facilitate effective analytics learning/modelling. As a result any generated automated analytics will be very restricted in its scope and will generate a large number of anomalies representing operating conditions that do not exist in the data.

ii- By focussing on data analytics alone we are ignoring the most important asset of any organisation; namely the expertise of its people in how to operate plants / processes. This expertise covers condition / risk assessment, planning, configuration, diagnostics, trouble-shooting and other skills that can involve decision making tasks. Automating ‘Decision making’ and applying it to streaming real-time IOT data offers huge business benefits and is very complementary to automated analytics in that it addresses the very areas in point 1 above where data coverage is incomplete, but human expertise exists.

Capturing expertise into an automated decision making system does require the programming of rules and decisions but that need not be a lengthy or cumbersome in a modern rules/decision automation technology such as Xpertrule. Decision making tasks can be represented in a graphical way that a subject matter expert can easily author and maintain without the involvement of a programmer. This can be done using graphical and easy to edit decision flows, decision trees, decision tables and rules. From my experience in using this approach, a substantial decision making task of tens of decision trees can be captured and deployed within a few weeks.

Given the complementary nature of automated analytics and automated decisions, I would recommend the use of symbolic learning data analytics techniques. Symbolic analytics generate rules/tree structures from data which are interpretable and understandable to the domain experts. Whilst rules/tree analytics models are marginally less accurate than deep learning or other ‘blackbox models’, the transparency of symbolic data models offer a number of advantages:

i- The analytics models can be validated by the domain experts
ii- The domain experts can add additional decision knowledge to the analytics models
iii- The transparency of the data models gives the experts insights into the root causes of problems and highlights opportunities for performance improvement.

Combining automated knowledge from data analytics with automated decisions from domain experts can deliver a paradigm shift in the way organisations use IOT to manage their assets / processes. It allows organisations to deploy their best practice expertise 24/7 real time throughout the organisation and rapidly turn newly acquired data into new and improved knowledge.

Below are example decision and analytics knowledge from an industrial IOT solution that we developed for a major manufacturer of powder processing mills. The solution monitors the performance of the mills to diagnose problems and to detect anomalous behaviour:

The Fault diagnosis tree below is part of the knowledge captured from the subject matter experts within the company

The tree below is generated by automated data analytics and relates the output particle size to other process parameters and environmental variables. The tree is one of many analytics models used to monitor anomalous behaviour of the process.

The above example demonstrates both the complementary nature of rules and analytics automation and the interpretability of symbolic analytics. In my next posting I will cover the subject of the rapid capture of decision making expertise using decision structuring and the induction of decision trees from decision examples provided by subject matter experts.


APP ADD Will Cause the Next Tech Bust, Absolutely!

APP ADD Will Cause the Next Tech Bust, Absolutely!
by Daniel Nenni on 07-12-2016 at 7:00 pm

After playing Pokemon Go with my nephew this weekend I have another solid data point to support my hypothesis that we are in yet another tech boom. Let’s call it fad based investors jousting with unicorns. Think dotcom bubble of 2000. What drove the dotcom bubble? Cheap money, magical valuations, market overconfidence, and a good old fashioned speculative boom reminiscent of Black Tuesday (10/24/1929). Sound familiar?

Let’s start with the chart below which shows the Pokemon Go Android app topping Snapchat, the millennial favorite, by a huge margin, proving once again that today’s consumers have a serious case of APP ADD (mobile application attention deficit disorder). So you really have to ask yourself; Self, how are app based unicorns like Uber, SnapChat, Aribnb, etc… going to thrive 5 years from now? The answer is of course, they won’t.


Uber is the easiest one to see through. Autonomous car companies like Tesla, Google, and maybe Apple will attack from the system level leaving technology void unicorns without food or shelter. And of course there are dozens of other car sharing App companies nipping at their hooves. Do you remember when Priceline.com was all the rage? Not so much anymore. The same could be said about eBay and the other bidding sites. I wonder, does Webvan.com qualify as the first tech unicorn death?

The latest example is the LinkedIn acquisition by Microsoft. Do you remember when LinkedIn was all the rage? The same goes for Skype (which was also acquired by Microsoft) and Craigslist. The difference of course being the magical valuations of today’s App Unicorns which leads me to believe that someone is going to get caught holding the bag and that someone is investors who will flee the tech industry like rats on the Titanic. Let’s call it “Dotcom Bubble Part II” or “Revenge of the Nerds”.

Now let’s talk about the semiconductor industry. So again, money is cheap, the semiconductor industry forecast is optimistically flat, valuations don’t really matter since money is cheap, and if you want to grow you have to buy revenue, the result being a handful of dominant companies and hundreds of emerging ones, right?

The next big shift amongst the ranks is what I call “Systems: The Transformation of the Semiconductor Industry”. Systems companies (Apple, Amazon, Tesla, Huawei, etc…) are designing their own chips and changing the way the fabless semiconductor ecosystem does business. Now that it is much easier to design and manufacture a semiconductor, who better than to design a chip than the company that uses them? By the way, we have come full circle here because computer “systems” companies started out building fabs so they could design and manufacture their own chips. Intel put them all out of business by producing the first mass market computer chip. The rest of the story is in the Computer History Museum in Mountain View California.

So what is a chip company to do? Pretend you are a systems company like Qualcomm? Buy systems companies to feed your chip business like Intel? Or build systems to make APPs ADD resistant? The answer of course is all of the above.


Who Really Needs Intel’s New 10 Core, 20 Thread Broadwell-E Core i7 Processor?

Who Really Needs Intel’s New 10 Core, 20 Thread Broadwell-E Core i7 Processor?
by Patrick Moorhead on 07-12-2016 at 5:00 pm

At Computex 2016 in Taipei last week, Intel announced their newest lineup of processors which included a brand new 10-core Extreme Edition processor. The focal point of Intel’s new line-up of Broadwell-E processors, the successor to Intel’s previous Haswell-E workstation/enthusiast lineup is the 10-core Core i7-6950X. Intel has never released a 10-core, 20-thread processor to their Core line of processors and has only offered such high core counts with their Xeon server processors. The Core i7-6950X is also the first consumer/prosumer product that they are offering that has as much as 25MB of L2 cache, 25% more than the previous extreme edition processor from Intel. These new processors are beasts, but who wants one or needs one? I certainly want one, but I’m not sure why.


Broadwell-E Die Shot (Credit: Intel)


More cores, tastes great

Compared to the previous generation, the new Intel Core i7-6950X is better than its predecessor the Core i7-5960X in virtually every way. The 6950X is based on Intel’s 14nm process node while the 5960X is only a 22nm processor, this means that Intel was able to pack more transistors into the 6950X while till maintaining the same thermal envelope. This rings true because Intel’s TDP between the two processors is still 140W even though the 6950X has two more CPU cores and at the same clock speeds. Intel also bumped the officially supported memory speeds from 2,133 MHz to 2,400 MHz, potentially adding overclocking headroom to those wanting or needing more memory bandwidth. Because Intel is staying on the X99 platform, the amount of PCIe lanes (40) and other chipset-tied specs remain unchanged.

Individually accelerate one CPU core and pin apps to it
With the new Intel Broadwell-E processors, there are some improvements to the overall usability of the processor like a new version of Turbo dubbed “Turbo Boost Max 3.0.” This new version is capable of finding the best Turbo speeds on each individual core and bringing it to that speed even if it isn’t the same as the rest of the cores. One can also “pin” that application to that sped-up core. Because the variance between cores exists, not all cores will clock the same, so Intel has a system of rating the cores based on their ability to clock and then assigns those cores certain Turbo speeds. This is valuable for applications that aren’t very well threaded and are integer-bound. Games aren’t very well threaded, so I think this could work well here.

Targeted to “megatasking” extreme gamers and content creators
Intel is positioning the new high-end desktop processors including the Core i7-6950X towards extreme gamers and content creators. Intel is selling this 10-core processor as something to be used for ‘mega-tasking’ a usage that I coined nearly a decade ago. Intel is associating the new processors and overall platform with virtual reality, which makes some sense from the content creation standpoint but doesn’t quite go much beyond that. Intel showed some very strong content creation improvements, but I don’t know where exactly the most efficient place to run this is, on the CPU or the GPU. In the end, it really doesn’t matter how it got a 20% improvement. Intel’s gamer scenario of a 25% improvement is one my son actually does, which is to play a 4K game and stream to Twitch. If I can get my grubby hands on one, I’ll try it out.

The biggest adders of the new platform are the added CPU cores, increased memory bandwidth and native support for things like Thunderbolt 3.0 which allows for external graphics connectivity and very fast storage arrays. However, the benefits of Thunderbolt 3 are a little bit less relevant when you consider that most Broadwell-E based systems will very likely have more than ample local graphics and storage. The primary purpose of Thunderbolt 3 on desktop this platform will be for external storage backups for data security purposes.

Stratospheric pricing
Last but not least is probably the most controversial part of Intel’s new processor family, the pricing. The new Core i7-6950X sets a new precedent for the company for Extreme Edition processor pricing. The new Core i7-6950X weighs in at a hefty $1723 per 1k tray price which means the final consumer price could be somewhere around $1,800. This new price is a good $700+ more than the previous generation’s top-end processor the 5960X. There is also the Core i7-6900K which is an 8 core processor priced at $999 per 1k tray and the 6850K and 6800K that are a priced $587 and $412 per 1k tray, respectively.

Intel appears to be doing this for a multitude of reasons, first and foremost they don’t have any competition in the high-end so they get to name their own price. Additionally, Intel most likely wants to protect their Xeons from being undercut by their Core i7 processors so by pricing the 10-core as high as they do they discourage anyone from buying a consumer processor instead of a Xeon. Intel won’t win themselves any fans by pricing this processor like they have, but they are also going to preserve their margins and ASP by doing so. The performance difference between the 6900K and 6950X is so little that is barely justifies the $700+ price increase, but then again most of the top-end processors rarely justify their price increases. However, one must also keep in mind that most people that are going to drop $1,000 on a processor are also less likely to care if the processor costs $1,000 or $1800 as they get bragging right and can tell their friends they have the “best”. Intel is very likely banking on this with their new pricing and it is still to be determined how well it is received by the market since it was just announced.

Wrapping up
Overall, the new Broadwell-E based Core i7 processors like the 6950X are designed to set a new bar for the high-end in high-end desktops. Intel is bringing us new processors with improved performance and features, and while those improvements without a doubt seem incremental to some, they are still improvements nonetheless. Many people were excited to see Intel finally launch a 10-core processor, but they just weren’t as excited to see the price that Intel is expecting consumers to pay. People who is more of a prosumer that uses their desktop for fun and work is more likely to build a system with these new Broadwell-E processors, but the reality is that Intel’s fastest offering simply isn’t that attractive at the price and performance it offers and could have a challenge getting traction.

More from Moor Insights and Strategy


EDA Tool for ATPG – Refactor or Rewrite?

EDA Tool for ATPG – Refactor or Rewrite?
by Daniel Payne on 07-12-2016 at 3:00 pm

In the life of all EDA software tools comes that moment when new requirements make developers stop and ask, should I continue to refactor the existing code or just start all over from scratch using a new approach? Synopsys came to that junction point when ATPG run times were reaching days or even weeks on the largest IC designs, something that caused too much pain for SoC designers trying to meet their tapeout schedules and have ATPG vectors ready in time for first silicon samples coming back from the fab. Necessity is the mother of invention, so the engineers at Synopsys went about the epic task of rewriting their popular ATPG tool TetraMAX to meet several emerging test challenges:

  • Increase in IC design size
  • New, subtle defects
  • FinFET technology
  • Improving diagnostics
  • Meeting automotive standards
  • Utilizing multi-core workstations
  • Efficient RAM usage

The new ATPG tool has a familiar name, TetraMAX II. We first started hearing about a new ATPG technology from Synopsys in the Fall of 2015, and now we learned that three major sections were re-written for:

  • Pattern generation
  • Silicon diagnostics
  • Fine-grain multi-threading

So what did the rewrite accomplish? Plenty.

  • 10X faster tester pattern generation
  • 25% fewer patterns
  • 3X reduction in RAM
  • ISO 26262 certified for automotive safety

Here’s a quick comparison of TetraMAX II versus the original across several customer designs:

All of the inputs and outputs from TetraMAX II are the same as with the original TetraMAX, so I expect that the learning curve is going to be quite brief to existing ATPG users. Some old commands are now ignored with TetraMAX II, like the command to trade off run time versus patterns.

How does Synospys get fewer test patterns in this new ATPG tool? The traditional method of targeting faults in a gate-level netlist are sequential, slow and not really optimized. With TetraMAX II the fault targeting approach uses something called iCubes, where many iCubes can be generated quickly in parallel on multiple cores. Each iCube is examined to see how it can be combined with other iCubes to detect the maximum number of faults with the fewest patterns. This fault and pattern optimization is all “under the hood” and protected by patents.

ATPG users can now expect that generating patterns on large chip designs will no longer be a bottleneck, and that with more efficient patterns the tester time goes down saving costs, plus you can now fit your ATPG runs onto machines using RAM way more efficiently. Automotive industry designers will be glad to know that this new Synopsys tool has been certified by SGS-TUV Saar GmbH up to ASIL D requirements as part of the ISO 26262 qualification of the IC test process. Even the way that bugs are reported and fixed in this ATPG tool get tracked and notifications sent out for any safety-related issues, all monitored by a dedicated Synopsys automotive Functional Safety Officer.

So who is actually using TetraMAX II? So far I’ve heard that Toshiba used the tool and and found that the ATPG run times are shorter, plus they saw some 50% reduction in the number of test patterns. STMicroelectronics is talking about the 10X speed up in ATPG run times. Expect even more customer quotes over the next year as the installed base of TetraMAX users hears about the revamped TetraMAX II, does an evaluation, and eventually upgrades to get the new benefits.

Summary
Synopsys now offers a 2nd generation of ATPG tool dubbed TetraMAX II, and it runs some 10X faster, typically producing 25% fewer patterns, all while consuming 3X less RAM. Now the big question is price, so for that info you must follow up with the local Synopsys account team to get the details and also to start an evaluation.

Related Blogs


Galaxy S7 and the Ongoing Charging Guessing Game

Galaxy S7 and the Ongoing Charging Guessing Game
by Bernard Murphy on 07-12-2016 at 7:00 am

In the back-and-forth competition between Samsung and Apple, the Galaxy S7 certainly seems to have notched a few wins over the iPhone 6S. Most reviewers feel the Samsung camera is noticeably superior and the overall look and feel is on a par with or better than the Apple product. I want to focus on just one area where Samsung differs from Apple – in power management (clear advantages) and charging (advantage on paper, less clearly in practice).

Who wins in this area on any given mobile product release comes down to 3 things: battery size, how quickly you drain that battery in normal use and how conveniently you can recharge. The first one is easy to compare – the S7 battery (3000mAh on the S7, 3600 mAh on the S7 Edge) is larger than the 6S battery (1810mAh on the 6S, 2750mAh on the 6S Plus) and is estimated to last even heavy usage throughout the day without needing to recharge.

For how quickly the battery drains, of course if you start with a bigger battery, you have an automatic advantage. Whether that matters to you depends on how you use your phone. If, like me, you use it primarily for calls, quickly checking mail and maybe a little navigation while out of town, the 6S has an edge because it’s reportedly still lower power than the S7 in standby mode. Multiple reviewers point to Samsung bloatware on top of Android as part of this problem.

On the other hand, if your phone is your main portal to the digital world and you’re using it for music, TV, games, browsing and everything else then active-mode battery lifetime is more important and the S7 seems to be comfortably ahead of the 6S.

Either way, you eventually have to charge your phone and that brings me to the seemingly endless guessing game on where solutions to that problem are headed. The default is wires – micro-USB for the S7, Lightning for the 6S. But of course wires are inconvenient and we’d really like to get rid of them – maybe. This has driven a resurgence in wireless charging – primarily around multiple standards: PMA (which you’ll find at Starbucks locations), Rezence (which is supported in a number of semiconductor devices, but I haven’t seen news of uptake in consumer products yet) and Qi (pronounced Chi, which you can find in Ikea furniture). PMA uses inductive charging, Rezence uses magnetic resonance charging – these guys have joined forces under the AirFuel Alliance. Qi is an incompatible standard that now supports both induction and resonance charging.

The S7 (and the S6 before it) makes this a don’t care by supporting both standards with a built-in IDT P9221 power-receiver. (TI had this slot in the S6, showing how fleeting is the glory of winning a slot in a major smart-phone.) So you can charge or top-up your S7 at either Starbucks or on top of an Ikea lamp at home. And you can do that without needing to add any after-market bits. Not so for the Apple phone – you have to use a plug-in induction coil at Starbucks or buy a Qi wireless charger receiver, both of which occupy the port on your phone when charging.

Do built-in wireless-charging options really matter? Starbucks and Ikea support still barely rise past the level of a curiosity – convenient but hardly the end of the world if we lost it tomorrow. The problem seems to be that wireless charging stations are just not at a critical mass to create demand that they should be in even more locations. Until then, this option may remain a nice-to-have and charging through a wire overnight remains no more than slightly inconvenient. Which begs the question of how secure a foothold wireless charging may have in cost-sensitive smartphones.

You can read a fairly detailed user-focused comparison of the Samsung and Apple phones HERE and analysis of wireless charging options HERE.

More articles by Bernard…


RISC-V opens for business with SiFive Freedom

RISC-V opens for business with SiFive Freedom
by Don Dingee on 07-11-2016 at 4:00 pm

When we talk about open source, free usually comes in the context of “freedom”, not as in “free beer”, and open IP often serves as a base layer of value add for commercialization. The creators of the RISC-V instruction set, now working at startup SiFive, have released specifications for their aptly-named Freedom processor IP cores looking for “enablement of great ideas”. Continue reading “RISC-V opens for business with SiFive Freedom”


How to Bring Coherency to the World of Cache Memory

How to Bring Coherency to the World of Cache Memory
by Tom Simon on 07-11-2016 at 12:00 pm

As the size and complexity of System On Chip design has rapidly expanded in recent years, the need to use cache memory to improve throughput and reduce power has increased as well. Originally, cache memory was used to prevent what was then a single processor from making expensive off chip access for program or data memory. With the advent of multi-core preprocessors, caches began to play an essential role in enabling rapid sharing and exchange of data between the cores. Without caches many of the benefits of a multi-core architecture would be lost to the inefficiencies of SRAM access.

As a result, processors in multi-core chips are built with cache coherent memory interfaces. Over time many new IP blocks, such as PCIe, have been developed that are part of SoC ecosystems, and many have support for cache coherency. There are of course multiple implementations for cache coherency. Even within a given interface there are parameters that can affect interoperability. In many cases there are good reasons for differing protocols for cache coherency, however this diversity of choices has stymied SoC architects and designers.

Recently I wrote about Arteris and their new Ncore cache coherency network that can link together IP blocks that support a variety of cache coherency protocols. Naturally, Ncore supports ARM’s AMBA ACE protocol. ARM sees the Ncore offering from Arteris as an efficient means to link together IP that uses heterogeneous cache protocols. This is great for cache coherent enabled IP going into SOC’s, but what about IP that is still necessary but has no cache support?

Well, some of the strongest interest in Ncore apparently has come from SoC companies that are faced with integrating non-cache coherent blocks into their designs. Next I’ll discuss how this can be done with Ncore to provide all the advantages of cache coherency to those blocks.

Ncore uses the Arteris FlexNoc as a transport layer for cache agents. This provides tremendous flexibility in allocating resources for cache data transfers. It also supports the cache coherent agents. For blocks that already have local cache Ncore provides a protocol interface and provides logic units for managing coherency. IP Blocks with only a traditional memory interface can use a non-coherent bridge provided by Ncore. Also proxy caches can be synthesized to meet the IP block needs.

The Ncore non-coherent bridge translates non-coherent transactions into IO-coherent ones. Multiple non-coherent data channels can be connected to a single bridge, allowing aggregation for more efficiency. Ncore proxy caches have read pre-fetch, write merging and ordering capability. The proxy caches are configurable up to 1MB per port. Both MSI and a subset of MEI coherence models are supported.

From the perspective of the non-cache coherent IP, it is still talking to external SRAM. But in actuality Ncore is presenting this block as a fully cache coherent block to the rest of the cache coherent network. Ncore allows the SoC architect to tune the parameters for the cache bridge to ensure optimal operation. Ncore and FlexNOC come with a fully integrated and sophisticated design suite to tailor the system to the SoC power, area, and performance requirements.

With the addition of Ncore Arteris now is in the enviable position of offering IP for unified SOC interconnect. Using one underlying transport layer for both coherent and non-coherent SOC data transfers lets architects build in the optimal interconnect resources. This approach maximizes utilization of chip real estate, while ensuring sufficient throughput for all data requirements. For more information on Arteris and their Ncore Cache coherent network IP, go to their website.