Discussion:
Potential of the CELL Processor for Scientific Computing
(too old to reply)
AirRaid Mach 2.5
2006-05-26 22:32:43 UTC
Permalink
PDF article: http://www.cs.berkeley.edu/%7Esamw/projects/cell/CF06.pdf

web article: http://www.hpcwire.com/hpc/671376.html
Jim Granville
2006-05-26 23:39:36 UTC
Permalink
Post by AirRaid Mach 2.5
PDF article: http://www.cs.berkeley.edu/%7Esamw/projects/cell/CF06.pdf
web article: http://www.hpcwire.com/hpc/671376.html
Interesting.
They state
" On average, Cell is eight times faster and at least eight times more
power efficient than current Opteron and Itanium processors, despite the
fact that Cell's peak double precision performance is fourteen times
slower than its peak single precision performance. If Cell were to
include at least one fully utilizable pipelined double precision
floating point unit, as proposed in their Cell+ implementation, these
speedups would easily double."

but rather than think about new HW, (which is vaporware), the Authors
could have also looked at ways of mixing the two precisions, for
Future work ?.

eg For that scientific SW that is convegence based, perhaps
a two step Software system, that uses the 14x faster 32 bit floats on
inner loops, and 64 bits on outer & final calculations, could also give
them the ~speed double ? - but on silicon they can actually get :)

-jg
Gayness
2006-05-27 02:25:56 UTC
Permalink
Half the average Joe doesn't know what that means
Post by Jim Granville
Post by AirRaid Mach 2.5
PDF article: http://www.cs.berkeley.edu/%7Esamw/projects/cell/CF06.pdf
web article: http://www.hpcwire.com/hpc/671376.html
Interesting.
They state
" On average, Cell is eight times faster and at least eight times more
power efficient than current Opteron and Itanium processors, despite the
fact that Cell's peak double precision performance is fourteen times
slower than its peak single precision performance. If Cell were to
include at least one fully utilizable pipelined double precision
floating point unit, as proposed in their Cell+ implementation, these
speedups would easily double."
but rather than think about new HW, (which is vaporware), the Authors
could have also looked at ways of mixing the two precisions, for
Future work ?.
eg For that scientific SW that is convegence based, perhaps
a two step Software system, that uses the 14x faster 32 bit floats on
inner loops, and 64 bits on outer & final calculations, could also give
them the ~speed double ? - but on silicon they can actually get :)
-jg
Brenden D. Chase
2006-05-27 03:14:06 UTC
Permalink
Post by Gayness
Half the average Joe doesn't know what that means
Yes, but any old idiot knows that 8.14 is better than 0.59 (not actual
figures, just for reference)
Post by Gayness
Post by Jim Granville
Post by AirRaid Mach 2.5
PDF article: http://www.cs.berkeley.edu/%7Esamw/projects/cell/CF06.pdf
web article: http://www.hpcwire.com/hpc/671376.html
Interesting.
They state
" On average, Cell is eight times faster and at least eight times more
power efficient than current Opteron and Itanium processors, despite the
fact that Cell's peak double precision performance is fourteen times
slower than its peak single precision performance. If Cell were to
include at least one fully utilizable pipelined double precision floating
point unit, as proposed in their Cell+ implementation, these speedups
would easily double."
but rather than think about new HW, (which is vaporware), the Authors
could have also looked at ways of mixing the two precisions, for
Future work ?.
eg For that scientific SW that is convegence based, perhaps
a two step Software system, that uses the 14x faster 32 bit floats on
inner loops, and 64 bits on outer & final calculations, could also give
them the ~speed double ? - but on silicon they can actually get :)
-jg
Brenden D. Chase
2006-05-27 02:38:29 UTC
Permalink
Post by Jim Granville
Post by AirRaid Mach 2.5
PDF article: http://www.cs.berkeley.edu/%7Esamw/projects/cell/CF06.pdf
web article: http://www.hpcwire.com/hpc/671376.html
Interesting.
They state
" On average, Cell is eight times faster and at least eight times more
power efficient than current Opteron and Itanium processors, despite the
fact that Cell's peak double precision performance is fourteen times
slower than its peak single precision performance. If Cell were to
include at least one fully utilizable pipelined double precision floating
point unit, as proposed in their Cell+ implementation, these speedups
would easily double."
but rather than think about new HW, (which is vaporware), the Authors
could have also looked at ways of mixing the two precisions, for
Future work ?.
eg For that scientific SW that is convegence based, perhaps
a two step Software system, that uses the 14x faster 32 bit floats on
inner loops, and 64 bits on outer & final calculations, could also give
them the ~speed double ? - but on silicon they can actually get :)
-jg
by these numbers PS3 should be somewhere around 2-3x more powerful than
360???

I agree with you on the speculation of the tweaked config. Eventually at the
end of the day, it all comes down to the programmer writing code that takes
advantage of the hardware.
WiiWii
2006-05-27 09:05:44 UTC
Permalink
Post by Brenden D. Chase
Post by Jim Granville
Post by AirRaid Mach 2.5
http://www.cs.berkeley.edu/%7Esamw/projects/cell/CF06.pdf
web article: http://www.hpcwire.com/hpc/671376.html
Interesting.
They state
" On average, Cell is eight times faster and at least eight times
more power efficient than current Opteron and Itanium processors,
despite the fact that Cell's peak double precision performance is
fourteen times slower than its peak single precision performance.
If Cell were to include at least one fully utilizable pipelined
double precision floating point unit, as proposed in their Cell+
implementation, these speedups would easily double."
but rather than think about new HW, (which is vaporware), the
Authors could have also looked at ways of mixing the two
precisions, for Future work ?.
eg For that scientific SW that is convegence based, perhaps
a two step Software system, that uses the 14x faster 32 bit floats
on inner loops, and 64 bits on outer & final calculations, could
also give them the ~speed double ? - but on silicon they can
actually get :)
-jg
by these numbers PS3 should be somewhere around 2-3x more powerful
than 360???
It's atleast 5-10x more powerful depending on how you set up everything.
I said this, what, 2-3 years ago?

As with the PS2, the design of PS3 isn't just to be more powerful and
efficient, it is to be more flexible.

That article is on HPC which is only one algorithm advantage.
Look here at the advantage in graphics against other processors for Ray
Tracing. (It's 35x btw)

see table 13.
http://www-128.ibm.com/developerworks/power/library/pa-cellperf/
Post by Brenden D. Chase
I agree with you on the speculation of the tweaked config. Eventually
at the end of the day, it all comes down to the programmer writing
code that takes advantage of the hardware.
Floating Point hardware will always be faster than Non Floating Point
hardware at calculating floats now matter how good a programmer you
are. And considering all the top programmers will be working with Sony
instead of MS we should be seeing HUGE differences in technology
between the PS3 and Xbox360. Expect some titles to make the Xbox360
look like a Dreamcast.


--
larwe
2006-05-27 12:19:23 UTC
Permalink
Post by Jim Granville
but rather than think about new HW, (which is vaporware), the Authors
could have also looked at ways of mixing the two precisions, for
Future work ?.
Jim, Jim, Jim... ANYTHING in cae crossposted to a video game NG (except
maybe a historical question from someone restoring a board) isn't worth
firing up the neurons to decode the original text.
Scott Michel
2006-05-30 17:29:12 UTC
Permalink
Dinesh Manocha at UNC Chapel Hill hosted a workshop last week covering
"Computing at the Edge with Commodity Processors" (although, it'd be
hard to argue that Cell is currently "commodity" until the PS3 shows
up... unless one likes to pay the early adopter premium offered by IBM
and Mercury Computer Systems.)

Yes, Cell does have some interesting potential and companies like
Mercury are developing improved math libraries. Other interesting
developments include program synthesis packages (FFTW, ATLAS) that may
generate improved code given the specific input problem (Keshav
Pingali's research at Cornell), as well as "metaprogramming" packages
like RapidMind's current software (Mike McCool's startup.)

One of the more interesting questions is whether Cell really has
significant advantages over GPU-accelerated computations. Clearly Cell
offers more flexibility over GPU in terms of overall programming.
However, upgrading my GPU is easier than a forklift Cell upgrade. And,
Havok's game engine demonstrates respectable game physics computations
on the GPU, so why do Cell if GPUs can do the job?

OTOH, rumor has it that game engines haven't taken advantage of the
Cell SPEs and that is one of the larger causes of the PS3 release.


-scooter
Scott Michel
2006-05-30 17:33:32 UTC
Permalink
Oh, yeah, I almost forgot: There are a lot of tricks that have to be
played to make the SPEs run and run fast. Communication with the SPEs
is exclusively via DMA transfers. It takes 275 cycles to set up DMA
to/from the SPE to off-chip memory, so latency hiding is a first class
process. The SPE's local store has to be aggresively managed (all 256K
of it), so, the developer has to be careful how DMAs are batched.

Basically, programming for the Cell SPE isn't trivial or pretty.


-scooter
Rick Jones
2006-05-30 18:54:27 UTC
Permalink
Post by Jim Granville
Post by AirRaid Mach 2.5
PDF article: http://www.cs.berkeley.edu/%7Esamw/projects/cell/CF06.pdf
web article: http://www.hpcwire.com/hpc/671376.html
Interesting.
They state
" On average, Cell is eight times faster and at least eight times more
power efficient than current Opteron and Itanium processors, despite the
fact that Cell's peak double precision performance is fourteen times
slower than its peak single precision performance. If Cell were to
include at least one fully utilizable pipelined double precision
floating point unit, as proposed in their Cell+ implementation, these
speedups would easily double."
What might that do to the power consumption of the thing?

rick jones
--
denial, anger, bargaining, depression, acceptance, rebirth...
where do you want to be today?
these opinions are mine, all mine; HP might not want them anyway... :)
feel free to post, OR email to rick.jones2 in hp.com but NOT BOTH...
Russell Wallace
2006-06-18 22:43:52 UTC
Permalink
Post by Rick Jones
Post by Jim Granville
Interesting.
They state
" On average, Cell is eight times faster and at least eight times more
power efficient than current Opteron and Itanium processors, despite the
fact that Cell's peak double precision performance is fourteen times
slower than its peak single precision performance. If Cell were to
include at least one fully utilizable pipelined double precision
floating point unit, as proposed in their Cell+ implementation, these
speedups would easily double."
What might that do to the power consumption of the thing?
Practically no effect, from what the original paper says.
--
"Always look on the bright side of life."
To reply by email, replace no.spam with my last name.
Scott Michel
2006-06-19 16:52:29 UTC
Permalink
Post by Russell Wallace
Post by Rick Jones
Post by Jim Granville
Interesting.
They state
" On average, Cell is eight times faster and at least eight times more
power efficient than current Opteron and Itanium processors, despite the
fact that Cell's peak double precision performance is fourteen times
slower than its peak single precision performance. If Cell were to
include at least one fully utilizable pipelined double precision
floating point unit, as proposed in their Cell+ implementation, these
speedups would easily double."
What might that do to the power consumption of the thing?
Practically no effect, from what the original paper says.
I'd take the paper's claims with a brick of salt: all of the results
are generated by simulator. There's very little available Cell hardware
or systems, unless you're lucky enough to unwedge a prototype blade
from IBM Austin. Evidently, the current Cell systems are real power
hogs. (Of course, that can only improve as time goes on...)


-scooter

Loading...