<div class="gmail_quote">On Tue, Mar 8, 2011 at 5:00 PM, Raymond Wagner <span dir="ltr"><<a href="mailto:raymond@wagnerrp.com">raymond@wagnerrp.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;">
<div class="im">On 3/8/2011 15:40, mythtv wrote:<br>
> I'm sure there are others who have their backend on 24/7. Is anyone else<br>
> (esp the devs) intrigued by the idea of replacing a 35W CPU (+ additional<br>
> system power) with a 1W CPU with higher performance?<br>
> <a href="http://blogs.nvidia.com/2011/02/tegra-roadmap-revealed-next-chip-worlds-first-quadcore-mobile-processor/" target="_blank">http://blogs.nvidia.com/2011/02/tegra-roadmap-revealed-next-chip-worlds-first-quadcore-mobile-processor/</a><br>
<br>
</div>Take a close look at those pictures. The ARM builds are done using<br>
heavily optimized settings and GCC 4.4.1. The Intel builds are done<br>
using typical settings and GCC 3.4.6. Just building with the (much<br>
much) newer 4.4.1 puts the Core 2 ahead of the ARM, and by using<br>
similarly optimized settings, the Core 2 is a good 50% more powerful<br>
than the ARM. Also understand that this test is only performing integer<br>
math, and the ARM platform has traditionally had pathetic FPUs.<br>
<br>
So 66% of the performance at only 1W consumption, that's pretty good,<br>
right? Well you're still not getting the whole story. That is one<br>
measurement using a synthetic benchmark, which as clearly shown can be<br>
falsified, and it is done using a quad core part. The scheduler is<br>
single threaded. The independent backend and MySQL bits are done<br>
sequentially, and not in parallel. The backend code is not parallel,<br>
and the sql calls are not something that can be broken into multiple<br>
threads well by the MySQL server. On a single threaded workload, the<br>
ARM is now only 33% the performance of the Core 2.<br>
<br>
Commercial flagging is going to be a bit different, because the decoding<br>
and detection can be handled in independent threads. Video decoding in<br>
North America is still either going to be MPEG2, which is single<br>
threaded, or H264 out of an HDPVR, which is single sliced and thus still<br>
single threaded. The T7200 at 2.0GHz won't quite be capable of handling<br>
full bitrate HDPVR output in real time, so the ARM at less than half the<br>
performance per core won't come close. If you intended to live with the<br>
scheduler constraints on an under powered backend, you would still want<br>
to have a separate machine (maybe your frontends) do your video<br>
processing for you.<br>
<br>
Now lets look at power consumption. The Core 2 is rated at 34W TDP, but<br>
that's both cores at full speed, plus heavy cache use, it's absolute<br>
worst case scenario. More realistically since this is largely single<br>
threaded, it's going to be closer to 20-25W while running the<br>
scheduler. When it's finished in 1/3 the time of the ARM, it will drop<br>
back to low power mode, and being a laptop part, it will be well under<br>
10W. 10W run non-stop, at average North American utility rates, equates<br>
to around $10/yr in power consumption. Even the desktop processors can<br>
be downclocked when idle such that the entire system will run under 25W<br>
at the wall. Do understand that a significant portion of the power<br>
consumption is going to be from the attached tuners, hard drives, and<br>
STBs (if you need analog capture), which are going to be the same<br>
regardless of what CPU you're using, and will likely end up consuming<br>
far more than that 10W idle power of a mobile Core 2.<br>
<br>
Let's take this a bit further. This T7200 chip they're comparing<br>
against was a release part. It's one of the original 65nm processors<br>
released in mid-2006 when the Core 2 line was first launched. They're<br>
comparing their brand new not-yet-available processor, to one that's<br>
nearly five years old, and several generations passed on both the<br>
microarchitecture, and fabrication techniques. I doubt they're even<br>
still available for purchase. I'd like to see the comparison between a<br>
modern dual core Sandy Bridge part, with a 17W TDP and turbo speed of<br>
2.7GHz, or quad core part with a 45W TDP and turbo speed of 3.4GHz<br>
(turbo being where one or more cores shut down to allow others to run at<br>
higher speed in the same power envelope). It will be even more<br>
interesting the AMD Bulldozer parts due out in a few months, where the<br>
cores physically have a trench dug around them, with gating to allow<br>
whole chunks of the chip to be completely powered down, and even the<br>
high end eight and sixteen core parts are expected to have an idle<br>
consumption under 10W<br>
<div class="im"><br>
> I was surprised that I couldn't find any discussion of the backend running<br>
> on ARM but then again ARM CPUs have never had this kind of horsepower<br>
> before.<br>
<br>
</div>There have been, but they've all come to the same conclusion that ARM is<br>
not sufficient high performance to recommend for a backend. Individual<br>
users claimed it was 'good enough', but they had limited channel count,<br>
with one or few tuners, and were willing to put up with the minute or<br>
longer scheduler runs.<br>
<br></blockquote><div> </div><div>Nice evaluation, Raymond. <br><br>/Brian/<br></div></div>