libSIMDx86 v0.4.0
The
optimized SIMD library for x86 processors.
Jump to:
Reference
Building
Roadmap
Homepage:
www.sourceforge.net/projects/simdx86
What
is libSIMDx86?
libSIMDx86,
also know as just SIMDx86, is an optimized math library meant for
developers of 3D games engines, 3D visualizations, 3D software
rasterizers, and other simulations that uses SIMD instructions, that
is Single Instruction, Multiple Data. Later it will support pixel
operations for images, and digital signal processing.
How does it work?
The
simple answer is with SIMD instructions: Ever since the Pentium MMX,
x86 microprocessors have had special instructions that perform
operations that cannot be represented by a single operation in C/C++,
as well as many other programming languages. Often times these
instructions can be used to unroll a loop, or do many operations per
clock cycle, such as operate on four values (vector processing)
instead of just one (scalar processing). The end result is code that
runs anywhere from 0% to 2000% faster, depending on the circumstance
and processor.
But I already wrote my library in C/C++, why
use this?
Code written in C/C++ must be translated in assembly
language by your compiler. Since the compiler often misses the larger
picture and focuses on a per-line or per-element basis, it often
outputs instructions that are scalar operations (i.e. work on a
single value at a time) even when, to a human, the parallelism is
apparent. Very few compilers output vector, or SIMD instructions
because it requires an extremely sophisticated compiler. libSIMDx86
however is about 95% written in assembly language using vector (SIMD)
instructions and does not need to be translated into a lower level
language. Also, since humans can think on a problem, some interesting
solutions that aren't available to the non-thinking compilers have
been come up with.
Why create this library?
Let's
face it, compilers are getting much better. Even before these special
SIMD instructions, people use to write the most critical parts of
their code in assembly language in order to get as much performance
out of their processor as possible. Now, it seems that using basic
assembly language for things gets you code that is on-par with the
compiler. Compilers such as the GNU C Compiler have mastered the i386
instruction set and all of its quirks. However, while a rather large
mastery of this has been obtained, SIMD output has always been a
difficult task for compilers since the constructs of SIMD capable
code is usually quite difficult to translate, much less see the
author's intent. Efforts have been made, but even the simplest SIMD
constructions still baffle compilers. Until that day far in the
future where compilers output SIMD code correctly and efficiently,
this library will allow programmers take advantage of SIMD operations
that have been lying dormant in their processors since the Pentium
MMX (1997). The results can be from no to mild to dramatic
improvements in performance.
Another problem is that high levels
of abstractions often cause the programmer to forget the hardware the
program is running on. Learning assembly language is considered
obsolete for all but the most esoteric reasons, such as writing an OS
kernel or hacking programs. However, knowledge of the machine's
architecture and instruction set allows people to take advantage of
the true power of their processor and squeeze every ounce of
performance out of it. In short, for those of us who do not want to
learn SIMD assembly language but want to take advantage of a
powerfully optimized library without sweating, this is for you. A
thing to note is that out of all of Sourceforge, there are not any
relatively complete libraries available that use SIMD
instructions.
What about Intel and AMD? Don't they make a
similar library?
Indeed they do, and I wouldn't be half
surprised if they outperformed libSIMDx86 right now. However, one has
to also understand that each company has not only a team of extremely
diligent programmers, but that they also have written a library that
is optimal for their implementation of the x86 architecture.
That means, given two processors, one from Intel and one from AMD,
that perform perfectly equal in everything else, running the Intel
math library will work better for the Intel processor and the AMD
library will work for the AMD processor better! Why is this?
AMD/Intel know the best way of doing things for their architecture.
Sometimes the optimization is at a hardware level, others at a
software. Hardware level optimizations, i.e. programming for a single
processor such as the Athlon64, is one optimization that the
Intel/AMD math libraries will have over this one. However, this
library will attempt to blend performance for many processors, even
support for older processors, not just the latest and greatest
Pentium IV or Athlon64 (at the time of this writing). Another thing
to note is that their libraries are not free, and not open source by
any means.
What SIMD instruction sets does libSIMDx86
support?
libSIMDx86 supports Intel's MMX, SSE, SSE2, SSE3; and
AMD's 3DNow!, 3DNow!+, and MMX+. Additionally, standard x87 (i.e.
non-SIMD) versions of the functions have been provided as a fallback
and a control. There is no supprt for Cyrix's extensions to MMX –
their processors are next to impossible to find, and the
documentation on the instructions are mostly non-existant. If someone
can send me documents and a Cyrix processor/motherboard, I will
definitely add support for it.
I'll try it. How do I use
the functions/where can I find examples?
Examples are included
with the source tree, and documentation for the library can be found
by clicking the link at the top, or here.
If you have any questions on the library's use, you may also email
the author.
How can I build my special version of
libSIMDx86?
See the top link on building, or click here.
How
can I build libSIMDx86 for non-x86 processors?
Ah... right...
well, there had to be at least one who would ask, right?
Portability
is a big issue for cross-platform developers, especially those
targeting a different architecture altogether.
Despite the name,
it is possible to run libSIMDx86 on non-x86 processors: compile
without defining an SIMD instruction set. As of version 0.3.1, the
non-SIMD version is written entirely in portable C, so that it could
run on a PowerPC64 or UltraSPARC machine for example. Obviously, it
won't be faster than a normal C library of similar functions, but at
least it can compile and link without big #ifdef ... #else ... #endif
blocks all over the place.
So if it can run under non-x86
architectures, do you intend to port it to use other architectures'
SIMD instructions?
Well, yes and no. The name libSIMDx86
itself somewhat defies the idea of “porting” to other
architectures, however if I had some documentation and hardware, I
probably could implement some functions. For example, if I had a
PowerPC with VMX, or UltraSPARC II with VIS, I could do some
accelerated functions. I would like to make this library the ultimate
solution for SIMD processing, a sort of one-stop-shop – however
I might rename the project if it gets that far.
Can I
build libSIMDx86 as a DLL or SO?
I suppose it is possible, but
definitely not supported officially. Perhaps you could edit the
output build script under UNIX/Linux and add -fPIC for position
independent code. Note that this will probably degrade performance.
Due to the fact that the library is written mostly in assembly
language, I wouldn't see why a static link into a project would
increase the binary that much. Some people hate having multiple
binaries though, and that is perfectly understandable too. However if
implemented correctly with what is known as code overlays,
then you will effectively have a statically linked DLL/SO file. See
the Roadmap section for more information.
Can you make libSIMDx86
select a function set at runtime rather than compile time?
Yes,
actually I can. The ultimate goal of libSIMDx86 is to have
overlaid code, that is, rather than use expensive function pointers
like a DLL or SO file, libSIMDx86 is statically linked to. However at
runtime, an instruction set (i.e. MMX, SSE) is selected, and
overwrites the code with the optimized version. The result: the all
flexibility of a DLL or SO file, with all the speed of a statically
linked library. To do this requires the operating system's help, and
thus makes libSIMDx86 not portable to all operating systems, or
rather, requires an explicit port rather than just a mere
recompilation.
I can't compile this under Microsoft Visual
C++ 6, 7, or 8! What is wrong?
LibSIMDx86 only compiles
successfully under GCC (that I know of). Sorry, it is the inline
assembly language syntax.
But I need to use libSIMDx86
under my MSVC++ project!
Well, you have a few options.
*You
could wait until the code overlay version comes out.
* You could
compile it as DLL using GCC (which is not supported by this project,
but possible), then use runtime linking.
* You could convert the
AT&T/GNU syntax to Intel/MSVC syntax and then recompile it –
if you do this, please send me the resulting work, I wouldn't mind
having a MSVC++ version.
* You could make the project into
separate .asm files.
* You could use only the C version of the
project (but then you would have no SIMD acceleration)
* You could
convert your project to use GCC instead of MSVC++ (it is not as hard
as you think!)
Couldn't you just convert the project to
separate .asm files to assemble?
Yup, I could.
Aren't
you going to do it?
Eventually (see the road map), but there
is a lot of work involved. The next release will include this.
There
is a function I need but isn't implemented, can you do it for me?
It
is possible: just send an email to the project admin(s), or post a
feature request on the website (below).
Something is broken
here! How do I submit a bug/patch?
Eeek! Since libSIMDx86 is
written mostly in assembly language, it is very possible that bug
crept in. If you think you have found a bug, and can isolate it to a
function, then by all means, submit a patch (if you can) or bug
report at the website at Sourceforge.net:
www.sourceforge.net/projects/simdx86/
A
surefire way of determining a bug in a function is to build
libSIMDx86 under x87 mode, so it uses correct, but not-SIMD
instructions and compare the outputs of the functions given the same
input.