5 Known Problems and Workarounds
- 1 -
1. Introduction
These notes describe the Base Compiler Development portion
(compiler_dev) of the 5.2 IRIS Development Option from
Silicon Graphics, Inc. They include discussion of compiler
tools, header files, libraries, dynamic shared objects, and
KPIC directives.
Note: Packaged with the IRIS Development Option software is
a separate sheet that contains the Software License
Agreement. This software is provided to you solely
under the terms and conditions of the Software
License Agreement. Please take a few moments to
review the Agreement.
This document contains the following chapters:
1. Introduction
2. Installation Information
3. Changes and Additions
4. Bug Fixes
5. Known Problems and Workarounds
In addition, Appendix A discusses dynamically shared objects
(DSOs).
1.1 Release_Identification_Information
Following is the release identification information for the
Base Compiler Development portion (compiler_dev) of the 5.2
IRIS Development Option:
Software Product Compiler_dev
Version 3.18
Product Code SC4-IDO-5.2
System Software Requirements IRIX 5.2 or later
1.2 Online_Release_Notes
After you install the online documentation for a product
(the relnotes subsystem), you can view the release notes on
your screen.
- 2 -
If you have a graphics system, select ``Release Notes'' from
the Tools submenu of the Toolchest. This displays the
grelnotes(1) graphical browser for the online release notes.
Refer to the grelnotes(1) man page for information on
options to this command.
If you have a nongraphics system, you can use the relnotes
command. Refer to the relnotes(1) man page for accessing
the online release notes.
1.3 Product_Support
Silicon Graphics, Inc., provides a comprehensive product
support maintenance program for its products.
If you are in the U.S. or Canada and would like support for
your Silicon Graphics-supported products, contact the
Technical Assistance Center at 1-800-800-4SGI. If you are
outside these areas, contact the Silicon Graphics subsidiary
or authorized distributor in your country.
- 1 -
2. Installation_Information
The IRIS Software Installation Guide fully documents the
process for installing the Base Compiler Development
software. In addition, each compiler has its own set of
release notes that describes product-specific installation
information.
2.1 3.18_Base_Compiler_Development_Subsystems
The 3.18 Base Compiler Development software (compiler_dev)
includes these subsystems:
compiler_dev.books Base compiler books
compiler_dev.books.dbx Base compiler dbx User's Guide
compiler_dev.hdr Base compiler headers
compiler_dev.hdr.internal Base compiler internal headers
compiler_dev.hdr.lib Base compiler environment headers
compiler_dev.man.base Base compiler components man
pages
compiler_dev.man.ld Base compiler loader man pages
compiler_dev.man.perf Base compiler performance man
pages
compiler_dev.man.util Base compiler utility man pages
compiler_dev.sw Base compiler software
compiler_dev.sw.abi Base compiler ABI software
compiler_dev.sw.base Base compiler components
compiler_dev.sw.ld Base compiler loader
compiler_dev.sw.perf Base compiler performance tools
compiler_dev.sw.util Base compiler utilities
compiler_dev.man.dbx dbx manual page
compiler_dev.man.lib Development environment manual
pages
- 2 -
compiler_dev.man.relnotes These release notes
compiler_dev.sw.dbx dbx debugger
compiler_dev.sw.lib Development libraries
2.1.1 Subsystem_Disk_Space_Requirements This section lists
the compiler_dev subsystems (and their sizes).
If you are installing this software for the first time, the
subsystems marked ``default'' are those selected for
installation automatically. They will be installed when you
give the go command unless you explicitly request (with the
keep command) that they not be installed.
Those marked ``miniroot'' must be installed from the
miniroot.
Note: The listed subsystem sizes are approximate. Refer to
the IRIS Software Installation Guide for information
on finding exact sizes.
- 3 -
Subsystem Name Subsystem Size
(512-byte blocks)
compiler_dev.hdr.internal 455
compiler_dev.man.base (default) 48
compiler_dev.man.ld (default) 50
compiler_dev.man.perf (default) 53
compiler_dev.man.util (default) 82
compiler_dev.sw.base (default) 8415
compiler_dev.sw.ld (default) 1549
compiler_dev.sw.perf (default) 2250
compiler_dev.sw.util (default) 2684
compiler_dev.hdr.lib (default) 2691
compiler_dev.man.dbx (default) 95
compiler_dev.man.lib (default) 5405
compiler_dev.man.relnotes (default) 147
compiler_dev.sw.dbx (default) 1656
compiler_dev.sw.lib (default) 12299
- 1 -
3. Changes_and_Additions
The features in this chapter are new or significantly
changed in the Base Compiler Development software since the
IRIX 4.0.5 Maintenance release. Except as noted, changes
apply to all versions.
3.1 Compiler_System
This section lists changes and additions to compilers and
development tools since the IRIX 4.0.5 Maintenance release.
3.1.1 Obsoleting_libmld libmld, either in the form of an
archive or a DSO, will not be released or supported in
future releases. If you use functions in the existing
libmld library, contact the Technical Assistance Center for
details concerning the migration of your libmld function
calls in your existing source code to other calls, probably
in libraries such as libelf, the ELF object-file support
library, and libraries containing symbol table manipulation
routines.
3.1.2 Dynamic_Linking_and_DSOs In earlier versions of IRIX
(pre-5.0), executables were only statically linked. This
means that all references must be resolved (and their
addresses fixed) at link time (by ld(1)). In this release,
such programs, although they might use pre-5.0 shared
libraries (which are referred to now as static shared
libraries) are referred to as non-shared. They are produced
by compiling and linking with the -non_shared option. The
code so created is not position-independent (PIC).
In 5.0 and later IRIX releases, in addition to being
statically linked by ld(1), programs are, by default,
compiled as PIC code and dynamically linked, that is, part
of the program may be relocated dynamically at run time.
There are two types of dynamically linked objects:
o The executable itself. This consists of your main
program and PIC code extracted from all archive
libraries linked with it. Code within the executable
is not relocated at run time, but some of its
references will be. The executable is linked
-call_shared.
o External sharable dynamically linked objects called
dynamic shared objects (DSOs), which are not part of
the executable itself. DSOs and their references may
be dynamically relocated at run time. DSOs are linked
-shared. DSOs by convention have the extension .so. A
DSO may be shared by several users and/or programs,
- 2 -
possibly at different addresses.
You cannot mix non-shared objects and PIC objects in the
same executable.
On this and future release, static shared libraries are
supported only for the use of existing (pre-5.0) executables
that reference them. You can neither create new static
shared libraries nor link new code with existing static
shared libraries.
PIC code satisfies references indirectly by using a Global
Offset Table (GOT), which allows code to be relocated simply
by updating the GOT. Your executable has one GOT, and each
DSO it uses has one GOT.
When a dynamically linked executable is started, the runtime
linker, rld(1), is invoked to prepare the program for
execution. This preparation involves:
o Filling in certain global values.
o Relocating any dynamic shared objects (DSOs) that your
program references.
o Resolving data symbols in DSOs that were unresolved at
static link time by ld(1).
With very few exceptions, all executable objects in this
release are dynamically linked. A new component, the
runtime linker /lib/rld, and all standard DSOs (file
extension .so) are necessary for programs to execute.
More information about these types of objects appears in
Appendix A, ``Frequently Asked Questions about DSOs,'' and
in the IRIX System Programming Guide.
3.1.3 Object_File_Format_Changes The compiler tools and
the link editor now produce ELF format objects and
executables by default. DSO is supported only in ELF
executables and object files. COFF files are run on IRIX
5.0 and later releases with the IRIX 4.0.5 ABI, and ELF
files are run with the IRIX 5.0 and later ABI; hence, the
linker refuses to mix (pre-5.0) COFF and ELF objects.
Two new header files are associated with ELF objects:
/usr/include/elf.h contains definitions that are generic to
all implementations. /usr/include/sys/elf.h contains
definitions specific to the MIPS architecture. See the
System V Application Binary Interface and System V
Application Binary Interface MIPS Processor Supplement,
- 3 -
published by Prentice Hall.
A new object file reader, elfdump(1), is associated with ELF
format files. This program is known on some other SVR4-
compliant systems as dump.
3.1.4 ABI_Development For information about ABI
development issues, see the man pages abicc(1), abild(1),
check_abi_compliance, check_abi_interface and
check_for_syscalls.
3.1.5 Versioning_of_Shared_Objects In the 5.0.1 release, a
mechanism for the versioning of shared objects was
introduced for SGI-specific shared objects and executables.
Note that this mechanism is outside the scope of the ABI,
and, thus, must not be relied on for code that must be ABI-
compliant and run on non-SGI platforms. Currently, all
executables produced on SGI systems are marked SGI_ONLY,
which allows use of the versioning mechanism.
Versioning allows the creator of a shared object to update
it in a way that may be incompatible with executables
previously linked against the shared object. This is
accomplished by renaming the original shared object and
providing it along with the (incompatible) new version.
Versioning is mainly of interest only to developers of
shared objects. It may not be of interest to you if you
simply use shared objects.
3.1.5.1 What_Is_a_Version? A version is part or all of an
identifying version_string that can be associated with a
shared object by using the -set_version version_string
option to ld(1) when the shared object is created.
A version_string consists of one or more versions separated
by colons (:). A single version has the form:
sgi.
where
is a comment string, which is ignored by the
versioning mechanism. It consists of any
sequence of characters followed by a #.
sgi is the literal string sgi.
is the major version number, which is a
string of digits [0-9].
- 4 -
. a literal period.
is the minor version number, which is a
string of digits [0-9].
Here is what to do when building your shared library:
o When you first build your shared library, give it an
initial version, say sgi1.0. Thus, add the option
-set_version sgi1.0 to the command to build your shared
library (cc -shared, ld -shared).
o Whenever you make a compatible change to the shared
object, create another version by changing the minor
version number, for example, sgi1.1, and add it to the
end of the version_string. The command to set the
version of the shared library might now look like
-set_version "sgi1.0:sgi1.1" .
o When you make an incompatible change to the shared
object:
- Change the filename of the old shared object by
adding a dot followed by the major number of one
of the versions to the filename of the shared
object. Do not change the soname of the shared
object or its contents. Simply rename the file.
- Update the major version number and set the
version_string of the shared object when you
create it to this new version, for example,
-set_version sgi2.0.
Here is how this versioning mechanism affects executables:
o When an executable is linked against a shared object,
the last version in the shared object's version_string
is recorded in the executable as part of the liblist.
This can be examined by elfdump -Dl.
o When you run an executable, rld looks for the proper
filename in its usual search routine.
o If a file with the correct name is found, the version
specified in the executable for this shared object is
compared to each of the versions in the version_string
in the shared object. If one of the versions in the
version_string matches the executable's version exactly
(ignoring comments), then that library is used.
- 5 -
o If no proper match is found, a new filename for the
shared object is built by taking the soname specified
in the executable for this shared object and the major
number found in the version specified in the executable
for this shared object, and putting them together as
soname.major. (Remember that you did not change the
soname of the object, only the filename.) The new file
is searched for using rld's usual search procedure.
3.1.5.2 Example: Suppose you have a shared object foo.so
with initial version sgi10.0. Over time, you make two
compatible changes for foo.so, which result in the following
final version_string for foo.so:
initial version #sgi10.0: upgrade I/O#sgi10.1:new devices#sgi10.2
You then link an executable that uses this shared object,
useoldfoo. This executable specifies version sgi10.2 for
soname foo.so. (Remember that the executable inherits the
last version in the version_string of the shared object.)
The time comes to upgrade foo.so in an incompatible way.
Note that the major version of foo.so is 10, so you move the
existing foo.so to the filename foo.so.10 and create a new
foo.so with the version_string:
efficient interfaces #sgi11.0
New executables linked with foo.so use it directly. Older
executables, like useoldfoo, attempt to use foo.so, but find
that its version (sgi11.0) is not the version they need
(sgi10.2). They then attempt to find a foo.so in the
filename foo.so.10 with version sgi10.2.
3.1.6 Runtime_Link_Editor_rld(1) and libdl
o rld is a new program that is invoked when running a
dynamic executable. It maps in shared objects used by
this executable, resolves relocations as ld does at
static link time, and allocates common if required.
rld is mapped in at program startup time by the kernel.
Its path is /lib/rld, but you can change it with the
_RLD_PATH environment variable.
There are two versions: rld and rld.debug. The first
is faster, the second provides debugginh support. Both
are described on the rld(1) man page.
o Options to rld can be specified via the _RLD_ARGS
environment variable. It is possible to replace
libraries without recompiling, get extra information
- 6 -
from the runtime linker, and alter some of the dynamic
linking semantics by specifying arguments in this way.
See the manual page rld(1) for details.
o The functionality previously available in
/usr/lib/libdl.so, a user interface to the dynamic
linker for manipulating the shared objects used by a
dynamic executable, is now part of libc.so.1.
Specifically, this includes the function calls
dlopen(3), dlclose(3), dlsym(3), and dlerror(3).
The following change in rld(1) was made in the 5.0.1 release
of IRIX:
o In release 5.0, rld zeroed the stack space it had used
before invoking the main program. As of release 5.0.1,
it no longer zeroes this space. If your program had a
bug that relied on an uninitialized automatic variable
being zero, the bug may be uncovered by this rld
change. If you suspect this to be the case, the
previous behavior (rld clearing its used stack space at
exit) can be obtained temporarily by adding the option
-clearstack to the environment variable _RLD_ARGS when
you run the program. However, do not rely on this
mechanism; there is no guarantee that the stack space
your program is relying on being zero will not be
dirtied by other startup code in future releases. The
buggy behavior in your program must be corrected. Note
that these problems most often will occur relatively
early in the call graph of your program.
The following change was made to functions in libdl in the
5.0.1 release:
o In the 5.0 release, when a shared object was opened via
dlopen(3x), its symbols became globally visible. This
behavior has been changed to be consistent with SVR4.
As of the 5.0.1 release, objects loaded by one
invocation of dlopen may not directly reference symbols
from objects loaded by a different dlopen invocation.
Those symbols may, however, be referenced indirectly
using dlsym(3x).
See the NOTES section of the dlopen(3x) manual page for
further information.
3.1.7 Changes_to_dbx(1)
o In 5.0.1 and later, you can set the variable
$assumenormalframe to decrease the time dbx takes to
produce a stack trace (by the where command), by using:
- 7 -
set $assumenormalframe=1
This variable should be set to zero (the default) when
requesting a stack trace if you are stopped in the
function prologue.
o Two new commands in dbx(1) deal with shared objects:
listobj and whichobj.
There are three new printing commands: printo, printx,
and printd. These print in octal, hexadecimal, and
decimal, respectively.
Command-line editing similar to that available in
emacs(1) is now available in dbx.
See /usr/lib/dbx.help for details on these new
commands.
o The dbx help system has been enhanced.
o The -f and -F options to dbx have been removed. The
readsyms and readglobals commands have been removed.
dbx now always does fast startup (the -f option) so
these options and commands are no longer needed.
3.1.8 Archiver_ar(1) The default format for the archive
symbol table has been changed. The default is now the same
as ar E and produces an SVR4-compatible symbol table. If
you want to produce the old symbol table format, use ar C.
3.1.9 Link_Editor_ld(1) The following changes have been
made to the linker ld(1):
o As of release 5.0.1, the linker can adjust executables
to avoid certain problems with early versions of the
R4000. If the -no_jump_at_eop flag is on (it is on by
default), small amounts of padding are added between
component objects to avoid placing a branch instruction
at the end of a page. Slightly smaller executables and
significantly faster executables can result by turning
this option off (using the -allow_jump_at_eop flag).
Binaries built either way should be compatible across
all Silicon Graphics systems, but those made with
-no_jump_at_eop (the default) often show performance
gains on R4000 systems.
o New options have been added to ld(1) for aligning
variables in the global uninitialized data area (bss).
See the manual page for ld(1) for options with names
beginning with -X. These new options are unique to
- 8 -
IRIX and might change across releases.
o The default object and executable file format has been
changed to ELF. Under no circumstances can you link
together ELF and (old) COFF objects.
o Static shared libraries are replaced by dynamic shared
objects. The linker no longer supports linking with
static shared libraries. However, existing executables
linked with static shared libraries continue to work.
o By default, the linker reports all undefined and
unresolved symbols and exits with non-zero status.
However, for shared linking, it is possible to allow
unresolved symbols at static link time and rely on the
runtime linker to complete the resolution at run time.
If you specify -ignore_unresolved, the linker does not
consider unresolved symbols to be errors. This option
is turned on by the driver if the environment variable
SGI_SVR4 is set.
o The linker now reports a maximum of 50 warnings
messages. If you want all warning messages to be
printed, specify -wall.
o The following new flags are related to DSO support.
Please refer to the manual page for details: -B
symbolic, -non_shared, -call_shared (default), -shared,
-all, -exclude, -no_archive, -transitive_link (default)
-check_registry, -update_registry, -set_version,
-ignore_unresolved (default), -no_unresolved,
-no_library_replacement, -soname, -delay_load, and
-export.
3.1.10 Optimizer_(uopt(5)) New optimizations and
improvements to existing optimizations have been added to
uopt.
o -strictIEEE
The optimizer performs some floating point expression
simplification in the presence of floating point
constants, which can cause different behavior in
programs that rely on strict adherence to the IEEE
floating point standard. An example is the
substitution of zero for multiplication by zero. This
flag suppresses such optimizations.
o -Wo,-nomultibbunroll
- 9 -
The optimizer now unrolls loops whose bodies contain
branches (that is, loop bodies made up of multiple
basic blocks). This internal optimizer flag suppresses
such unrolls.
o -noinline
This option disables the inlining operation performed
by umerge under -O3. This flag is not meaningful if
-O3 is not specified.
o -inline_to
The default value of this parameter is 0. A positive
value of this parameter asks umerge to perform
additional inlining of calls to leaf routines up to the
specified level, in addition to its automatic decision
mechanism. A value of 1 causes all calls to leaf
procedures to be inlined. A value of 2 additionally
causes all calls to procedures that became leaves due
to level 1 inlining to be inlined, etc. Under this
option, a procedure becomes a leaf in the inlined
output code if and only if the procedure's maximum
distance from a leaf in the call graph is less than or
equal to the value of this parameter. This option is
not affected by the -noinline option and is meaningful
only if -O3 is not specified.
o -nokpicopt
This option tells uopt not to perform the special
optimization for accesses of global variables when
compiling shared. (-kpicopt is the default for shared
compilations)
o -kpicopt
This option tells uopt to perform the special
optimization for accesses of global variables that are
not gp-relative whether compiling shared or non-shared.
(-nokpicopt is the default for non-shared compilations;
however, some programs, particularly if compiled -G 0,
might benefit from this optimization even if compiled
-non_shared.)
3.1.11 Assembler_(as(1))
o Several new assembler directives are added to support
generation of PIC (Position-Independent Code). You
should also become familiar with the MIPS ABI
Supplement and the PIC coding model it describes. See
- 10 -
Section 3.2, ``KPIC Directives.''
o The assembler generates ELF object file format. Whether
the resulting object is PIC depends on whether an
.option pic0 or .option pic2 directive appears in the
assembler file and on command-line arguments. (The
directive appearing in the .s file takes precedence.)
In the .option directive, pic0 indicates non-PIC, and
pic2 indicates PIC code. PIC code can also be
specified on the command line (in the absence of an
.option directive) by the switch -KPIC. If no .option
is present in the assembler file and -KPIC does not
appear on the command line, the default is non-PIC.
o A number of new optimizations have been added to the
assembler. They are invoked automatically at
optimization level 2 (-O2) and above. See the as man
page for more information about -peep, -swpipe, and
-symregs.
o Cross basic-block scheduling is now enabled by default
at optimization levels 2 and above. It can be disabled
with the -Wb,-noxbb option. This optimization moves
instructions from one basic block to another to allow
for better scheduling.
o Since the last release, enhancements have been made in
the software pipelining and peephole optimizations in
the assembler.
3.1.12 Libraries The following changes to the libraries
that are part of the compiler system were made in the 5.0.1
release.
o The exception handling library, libexc.so, has been
changed to allow for correct handling of exceptions in
Ada code and for the correct functioning of non-local
GOTOs in Pascal code. Previous to this release, non-
local GOTOs appearing in Pascal code in a shared object
did not function correctly. Due to implementation
changes in the handling of non-local GOTOs necessary to
correct this problem, all Pascal code, whether in a
shared object or not, should be compiled and relinked
in 5.0.1 and later. If you are certain that none of
your Pascal code uses non-local GOTOs, you can ignore
this requirement.
o With the 5.0.1 and later releases, C++ code is linked
by default with the new shared object libC.so, which is
a shared version of libC.a. See the C++ release notes
for further information.
- 11 -
3.1.13 Performance_Tools This section includes changes to
pixie(1), pixstats(1), and prof(1). It also includes a
detailed note (with an example) on using these tools with
DSOs.
o The program cord(1) is not provided in this release.
o These tools will not work on executables produced on
IRIX 4 systems. For IRIX 4 functionality, you should
invoke the IRIX 4 pixie,prof, and pixstats explicitly.
They will not be run automatically under the IRIX
compatibility mode.
The following changes to pixie have occurred in the 3.18
Base Compiler Development release. See the pixie(1) manual
page for more information.
o pixie no longer produces a .Addrs file. This
information is now contained in a section called
``.MIPS.Addrs'' in the instrumented object.
o During runtime, there will be only one .Counts
generated per thread. Previously, there was one
.Counts file for every DSO and main in a thread.
Multiple .Counts files occur when forks and
multiprocessing calls occur.
o pixie now instruments automatically all shared
libraries in the program's internal liblist. This
means that for most shared programs, you only need to
invoke pixie on the main executable.
o The old -o has been renamed -pixie_file. The
-pixie_file option allows the user to rename the the
instrumented output executable. The default is to name
the output file the same as the input file with the
suffix .pixie added.
o The option previously named -bbcounts has been renamed
-counts_file. The -counts_file option allows the user
to rename the the output counts file. The default is to
name the output file the same as the input file with
the suffix .Counts added.
o The -branchcounts option is now default. See the
description of the -branchcounts option below.
o -verbose permits printing most pixie transformation
messages.
- 12 -
o The new -liblist option causes pixie to write out a
list of dependent dynamic shared libraries to a file
with the same base name as the main executable with
.liblist as the suffix. This has no effect when used
on libraries or non-shared programs. The commmand:
pixie -liblist my_prog
generates a file my_prog.liblist.
o The new -autopixie option tells pixie to instrument all
dependent dynamic shared libraries recursively. This
has no effect when used on libraries or non-shared
programs. -autopixie is on by default.
o When the new -longbranch option is used, pixie
transforms branches into jumps. This should only be
used when pixie complains about branches out of range.
A branch can become out of range because pixie inserts
code into the executable in order to perform the
runtime performance data gathering and branches
previously within range become out of range.
In addition, the following changes of note have occurred in
pixie in recent 5.x releases.
o When instrumenting a shared library, the text segment
could grow to overlap with the data segment. In the
current implementation, the data segment is moved to a
higher region in the virtual address space to avoid
this conflict. For the main program, if the user
compiled it with the -ld option to specify the text and
data address, it is the user's responsibility to leave
enough space in the data segment to have it
instrumented properly. Otherwise, pixie will generate
an error message.
o Signal handling is done by intercepting the ksigaction
system call at runtime and instrumenting the sigreturn
system call at when pixie is run. Pixie image register
values not saved in the sigcontext structure are thus
saved and restored.
o The -branchcounts option causes pixie to add more
counting code so the instrumented program produces
specific information on branch use. pixstats
automatically understands the new information.
Specifically, information is produced for the following
events:
- 13 -
- Branch to branch taken
- Branch to branch untaken
- Untaken conditional branches
- Taken conditional branches
- Taken conditional branches with branch nops
- Untaken conditional branches with branch nops
- Direction-predicted conditional branches with
branch nops
- Non-sequential fetches
- Taken branches per conditional branch
- Forward taken branches per conditional branch
- Forward untaken branches per conditional branch
- Backward taken branches per conditional branch
- Backward untaken branches per conditional branch
o The -pids option tells pixie to append the process ID
number on the end of the .Counts name. This is handy
if you want to run the program instrumented with pixie
through a variety of tests before generating the
statistics with pixstats. This option should be used
with the -pids option to pixstats, which is available
on the 5.0.1 and later releases.
o -threeway may be used on the 5.0.1 and
later releases to suppress pixie transformations on
threeway transfers (low-level graphics hardware
access). If you are instrumenting libgl.so with pixie
on a system that has VGX, GTX or Reality Engine
graphics, your program may use this special mechanism
for some graphics operations. If you experience
problems running your instrumented graphics application
on these systems (problems usually result in the
graphics simply being black), re-instrument your
libgl.so with the correct -threeway option. Use
-threeway 3000 for RealityEngine systems and -threeway
6000 for VGX and GTX systems.
o -quiet was added in 5.0.1 to suppress most pixie
transformation messages.
- 14 -
o -table can be used in 5.0.1 and later releases to cause
pixie to write a copy of its translation table to the
stdout device. The translation table is a map of the
original addresses to the instrumented addresses.
o Static shared libraries are no longer supported.
o -oldtrace is no longer supported.
o Several options to pixie meant for internal use only
are no longer available. These are:
- -get_shared_data
- -calculate_registers
- -sharedlib
The following changes to pixstats(1) have been made in the
5.x release. See the manual page for more information.
o -excludelibs tells pixstats to ignore statistics from
libraries. By default, pixstats outputs statistics
that include all libraries.
o -pids ... tells pixstats to combine the
statistics found in .Counts., .Counts.,
etc., in its output. If your program uses sproc(2),
fork(2), is compiled with Power Fortran or Power C, or
you used the -pids option when you instrumented it, the
.Counts file resulting from its execution will be
placed in .Counts., and you must use pixstats
-pids to process it.
o The .Counts and .Addrs files generated by 4.0.5 pixie
are no longer supported. You cannot use old versions
of these files with the performance tools on this
release.
o pixstats now looks at the file header to choose the
timing table. If the file header indicates:
MIPS3 r4000 timing is used
MIPS2 r6000 timing is used
MIPS1 r2000 timing is used
o -disassemble disassembles basic blocks with zero
counts. The old behavior can be produced with
-dislimit 1.
- 15 -
o -source or -S option has been added to provide source
listing with disassembly.
o -mips2 has been added as a synonym for -r6000.
o -mips3 has been added as a synonym for -r4000.
3.1.13.1 Using_pixie(1)_and_pixstats(1)_with_DSOs DSOs can
be instrumented for basic block counting. All shared
libraries used by an instrumented executable must also be
instrumented.
3.1.13.1.1 Example: Instrument a Program with Shared
Libraries To run a program instrumented with pixie, you
must instrument all the dependent DSOs. pixie will now
instrument the main program and the needed libraries
automatically:
pixie my_prog
Or, you can instrument each one individually:
pixie -noautopixie my_prog
pixie lib1
pixie lib2
:
pixie libn
pixie tells you which libraries need to be instrumented if
you use the -liblist option. With this option, pixie
produces a file named my_prog.liblist that contains the
names of the needed dynamic shared libraries with their full
paths. This is convenient if you wish to build a dependency
list for a makefile or shell script. For example:
pixie -liblist -noautopixie my_prog
foreach lib (`cat my_prog.liblist`)
pixie $lib
end
WARNING: during static instrumenting, pixie cannot detect
accurately dynamic shared libraries that are with calls to
dlopen(). rld will detect that the main program has been
instrumented and will append .pixie to the name of any file
to be opened with dlopen(). However, you then still need to
instrument these libraries.
The runtime linker (rld) needs to know where the
instrumented libraries are. Set the environment variable
LD_LIBRARY_PATH to the directory where you keep the
libraries or put the instrumented libraries in the current
- 16 -
default search path for rld.
setenv LD_LIBRARY_PATH `pwd`
or
setenv LD_LIBRARY_PATH .
tells rld to look in the current directory.
You could just as easily put all of your instrumented
libraries in a single directory and set LD_LIBRARY_PATH to
that path. Just remember that to profile the program, both
pixstats and prof need either the original or a link to:
o The original DSOs and a.out
o The instrumented DSOs and a.out
o The .Counts files that were produced by running the
instrumented program
You can gather statistics for the whole program or a
specific DSO:
pixstats gives the statistics (including DSOs).
pixstats -excludelibs gives the statistics
(excluding DSOs).
pixstats gives the statistics of a DSO.
3.1.13.1.2 Example: Instrument a Program That Uses Multiple
DSOs
1. Run pixie on the program to instrument both the the
main program and the shared libraries it depends on:
pixie my_prog
2. Run the program to completion:
my_prog.pixie file1 file2
There should now be one .Counts file, myprog.Counts.
The .Counts file was created when the application ran.
3. Run pixstats to generate the statistics:
pixstats my_prog > my_prog.stat
- 17 -
3.1.13.1.3 Example:_Instrument_an_MP_Program In this
example, you instrument a Fortran Multiprocessing Program.
1. Compile a MP Fortran program:
f77 -o myprog -mp myprog.f
2. Instrument the program and its libraries:
pixie myprog
3. Run the program to completion:
setenv LD_LIBRARY_PATH .
myprog.pixie
There should be one .Counts file per thread per DSO.
For example running myprog.pixie with four threads:
myprog.Counts.1001, myprog.Counts.1002,
myprog.Counts.1003, myprog.Counts.1004,
.
.
4. Analyze the output in one of the following ways:
o To analyze each of the threads:
pixstats myprog myprog.Count.1001
pixstats myprog myprog.Count.1002
pixstats myprog myprog.Count.1003
pixstats myprog myprog.Count.1004
o To analyze the sum of the threads:
pixstats myprog myprog.Counts.*
o To analyze the sum of the threads excluding all
libraries:
pixstats myprog -dso myprog
o To analyze a thread using prof:
prof -pixie myprog myprog.Counts.1004
o To analyze all threads together using prof:
prof -pixie myprog.Counts.*
- 18 -
3.2 KPIC_Directives
PIC code is generated if either the directive .option pic2
appears in the assembler file or the assembler (as(1)) is
invoked with -KPIC in the absence of an explicit .option
pic0 or .option pic2 in the assembler file. Unless PIC code
is being generated, the other options in this section are
ignored by the assembler, with the exception of .gpword,
which becomes .word. Thus, you can easily use the same
assembler file for generating PIC and non-PIC (that is,
non-shared) objects by not placing .option pic0 or .option
pic2 in the assembler file and invoking the assembler
without -KPIC (for non-shared) or with -KPIC (for PIC code).
o .option pic2
This directive forces the assembler to mark the output
object file as containing PIC code and activates the
following directives. It overrides the command line
argument. Normally, you don't need to specify this
directive. Instead, you should use -KPIC or
-non_shared to toggle between generating PIC or non-
PIC.
Note that even though -KPIC is the default for the
high-level language driver (cc/pc/f77), it is not the
default for assembly sources. In the absence of an
.option pic0 or .option pic2, you must explicitly
specify -KPIC for compiling .s files to get PIC code.
o .cpload reg
This directive expands into three instructions that set
the gp register to the context pointer value for the
current function. It should always be placed in a
noreorder area (that is, it should be preceded by .set
noreorder and followed by .set reorder.) This
directive expands into:
lui gp,_gp_disp
addui gp,gp,_gp_disp
addu gp,gp,reg
_gp_disp is a reserved symbol that the linker sets to
the distance between the lui instruction and the
context pointer. This directive is required at the
beginning of each subroutine that uses the gp register.
You must add this directive at the beginning of every
procedure, with the exception of leaf procedures that
do not access any global variables and procedures that
- 19 -
are static (that is, not marked .globl or .extern).
Note: The MIPS ABI requires that .cpload use register
$25.
o .cprestore offset
This directive causes the assembler to issue:
sw gp,offset(sp)
where it appears. Additionally, it causes the
assembler to emit:
lw gp,offset(sp)
after every jump-and-link (jal) (but not branch-and-
link (bal)) operation, thereby restoring the gp
register after function calls. You are responsible for
allocating the stack space for the gp. This space
should be in the saved register area of the stack frame
to remain consistent with calling and debugger
conventions.
o .gpword local-sym
This directive is similar to .word, except that the
relocation entry for local-sym has the R_MIPS_GPREL32
type. After linkage, this results in a 32-bit value
that is the distance between local-sym and the context
pointer (that is, the gp). local-sym must be local.
It is currently used for PIC switch tables.
o .cpadd reg
This directive adds the value of the context pointer
(gp) to reg.
- 20 -
EXAMPLES:
This following is a simplified version of the hello world
program:
.option pic2
.data
.align 2
$$5:
.ascii "hello world\\X0A\\X00"
.text
.align 2
main:
.set noreorder
.cpload $25
.set reorder
subu $sp, 40
sw $31, 36($sp)
.cprestore 32
la $4, $$5
jal printf
move $2, $0
lw $31, 36($sp)
addu $sp, 40
j $31
The actual instructions generated by the assembler will be:
lui gp,0 #
addiu gp,gp,0 # generated by .cpload
addu gp,gp,t9 #
lw a0,0(gp) # gp-relative addressing used
lw t9,0(gp) # t9 is used for func. call
addiu sp,sp,-40
sw ra,36(sp)
sw gp,32(sp) # from .cprestore
jalr ra,t9 # jal is changed to jalr
addiu a0,a0,0
lw ra,36(sp)
lw gp,32(sp) # activated by .cprestore
move v0,zero
jr ra
addiu sp,sp,40
nop
PIC Linkage Conventions
o The MIPS ABI requires register t9 ($25) to be used for
indirect function calls, so .cpload should always use
$25. Noreorder mode must be in effect when the .cpload
directive is encountered. Also, make sure that t9 is
- 21 -
not in use before any function call, as its value will
be destroyed.
o If your program uses an indirect jump (jalr), you must
also use t9 as the jump register.
o If you have an unconditional jump to an external label:
j _cerror
you have to rewrite it into an indirect jump via t9,
that is:
la t9,_cerror
j t9
o If you use a branch-and-link (bal) instruction for
calling a function in the same file, and the target
procedure begins with a .cpload, your bal must be to an
alternate entry point in the function after the
.cpload:
foo: .set noreorder # callee
.cpload $25
.set reorder
$$1: ... # alternate entry point
...
j $31 # foo returns
bar: ... # caller
...
bal $$1 # bypass the .cpload
...
This is very important because .cpload assumes register
$25 contains the address of foo, but in this case, $25
is not set up. Note that because both foo and bar
reside in the same file, they must have the same value
for $gp. So the .cpload instructions can be and must
be bypassed. However, because foo can still be called
from outside, the .cpload is still required.
Alternatively, if you don't want to have an alternate
entry point, you can set up register $25 before the
bal:
la t9,foo
bal foo
or, if foo is an external symbol, you can simply use a
jal (and allow the assembler to set up t9 for you).
- 22 -
Both of these methods are slightly less efficient than
adding an alternate entry.
o .gpword and .cpadd are used together to implement a
position-independent jump table (or any table of text
addresses). Entries of the address table created by
.gpword are converted into displacements from the
context pointer. To get the correct text address, use
.cpadd to add the value of gp back to them. Because
the gp is updated by the runtime linker, the correct
text address can be reconstructed regardless of the
location of the DSO.
3.3 Library_and_System_Call_Functionality
The following additions and changes were made to library and
system call functionality between versions 4.1 and 5.2 of
the IRIS Development Option.
o IRIX 4.0 source programming interfaces to system calls
and system libraries in IRIX 5.0.1 and later are
compatible with those in IRIX 4.0. Code that compil ed
under IRIX 4.0 and uses commonly recognized practices
for writing portable code should compile without
modification on IRIX 5.0.1 and later.
o Recursive versions of some libc functions have been
provided. These correspond to the POSIX 1003.4a
specification for reentrant functions. These functions
are present in the default compilation mode-if you are
compiling in POSIX-compliant mode (_POSIX_SOURCE
defined), programs should be compiled with the feature
test macro _SGI_REENTRANT_FUNCTIONS defined.
o The POSIX 1003.4a specification for making stdio
multi-thread safe has bee n implemented. In the
default compilation mode, all stdio functions are
thread safe. In POSIX or ANSI compilation mode, the
program must define the feature test macr o
_SGI_MP_SOURCE in order to get the thread safe versions
of stdio functions
and macros.
o The handling of the global error value, errno, has
changed from IRIX 4.0. If the program includes
and defines the feature test macro _SGI_
MP_SOURCE, references to errno actually reference a
per-thread errno; otherwise, the global variable errno
is accessed. All system calls update both the per-
thread and global versions of errno.
- 23 -
o The MIPS ABI mutual exclusion library libmutex.so is
supported. The actual implementation of the routines
is in libc.so.1. These routines, init_lock,
acquire_lock, release_lock, and stat_lock, provide
low-level portable access to a mutual exclusion
primitive (see abiloc k(3x)).
o The math library libm.a has been carefully checked to
ensure its conformance with both the SVID 3rd Edition
and ANSI X3.159-1989. Specific information can be
found in the man pages sinh, exp, bessel, floor, gamma,
math, hypot, sinh, sqrt, and trig.
o The interface to the function scalb(3m) has changed to
conform to SVR4. In previous releases, the type of the
second argument to scalb (the exponent) was int. In
this release, the type of the second argument is
double. In addition, the functions scalb and rint have
been moved from the math library to the C library.
o A new option, flush_to_zero, has been added to
libfpe.a. On an R4000-based system, using this option
can improve execution performance if many floating
point underflows occur.
- 1 -
4. Bug_Fixes
This section lists the significant bugs fixed in the base
compilers since the IRIX 4.0.1 release.
4.1 Compiler_Bug_Fixes
4.1.1 Linker_(ld(1))
o The default cache size was changed to the size of the
R4000 cache (8K) in 5.0.1. This default may still be
changed by use of the -Xcachesize size option to ld.
o The size of the bss is now one-half what it was in IRIX
4.0.1. The bss region in an a.out is now essentially
the same size as it would have been in IRIX 3.3.3.
o Incremental linking using the -A command has been
fixed. Adding a -allow_jump_at_eop to an ld -A link is
no longer necessary.
o The -Xlocaldata option now works correctly, including
its special symbols.
o Many memory leaks in the linker have been fixed. This
regains most of the linker performance lost in the
previous release.
4.1.2 Run-time_Linker_(rld(1))_and_libdl(3x) The following
bugs were fixed in the 5.0.1 release of rld and the dynamic
linking library libdl.
o In 5.0.1, dlopen(3x) of a shared object which was
created with the -init option calls the -init routine
before dlopen returns. In 5.0, the -init routine was
not called at dlopen.
o In 5.0, libdl routines could call exit(2) under certain
circumstances (for example, if the desired library
could not be opened). In 5.0.1, the libdl routines
return an error value under these circumstances as
documented in their manual pages.
o In the 5.0 release, when a shared object was opened via
dlopen(3x), its symbols became globally visible. This
behavior has been changed to be consistent with SVR4.
In the 5.0.1 release, objects loaded by one invocation
of dlopen may not directly reference symbols from
objects loaded by a different dlopen invocation. Those
symbols may, however, be referenced indirectly using
dlsym(3x).
- 2 -
See the NOTES section of the dlopen(3x) manual page for
further information.
4.1.3 Assembler_(as(1,5)) Several bugs in the assembler
have been fixed since the previous release. These include
bugs in the various assembler optimizations such as software
pipelining and peephole optimization.
4.1.4 Optimizer_(uopt(5)) Numerous significant bugs have
been fixed since the IRIX 4.0.5 release.
4.1.5 Code_Generator_(ugen(5)) Several problems with code
generation have been fixed since IRIX 4.0.5.
o Several problems with unaligned data accesses have been
fixed. (1127521, 129034)
o Code generation for FORTRAN's SIGN function has been
fixed.
o An overflow problem with Pascal passing large objects
has been fixed (126986).
4.1.6 The_Debugger_dbx(1) In the version of dbx released
with 5.0, attempts to use the
stop
or
trace
constructs failed. The dbx documentation states:
``If an is given, that expression is assumed to be a
pointer and the thing-pointed-at is inspected at the
`appropriate' points.''
In the 5.0 version, the was inspected at 'appropriate'
points, rather than the thing-pointed-at by . The
result was an inoperative trace or stop command.
This problem was fixed in 5.0.1.
4.1.7 Performance_Tools The stability of pixie was greatly
improved in the 5.0.1 release. In addition, as of 5.0.1 it
is possible to instrument a multiprocessing program with
pixie.
As of the 3.18/5.2 release, prof can now collect statistics
about dynamic shared libraries. In addition, multiprocessor
- 3 -
support is now working.
4.1.8 Libraries The following bugs have been fixed in
libraries.
o The exception handling library, libexc, has been
changed to allow for correct functioning of non-local
GOTOs in Pascal code. In previous releases, non-local
GOTOs appearing in Pascal code in a shared object did
not function correctly. Due to implementation changes
in the handling of non-local GOTOs necessary to correct
this problem, all Pascal code, whether in a shared
object or not, should be compiled and relinked in 5.0.1
and later. If you are certain that none of your Pascal
code uses non-local GOTOs, you can ignore this
requirement.
o The atof and strtod functions now return correctly
signed HUGE_VAL for arguments too large in magnitude.
In addition, strtod sets errno to ERANGE.
o The ldexp function now correctly returns HUGE_VAL and
sets errno to ERANGE if the result overflows.
o The precision of conversion between ASCII and binary
floating point has been significantly improved in this
release.
o Rounding into the least-significant digit of an output
floating point format is now done correctly in all
cases. In previous releases, printing .00053 with a
format of %.3f printed 0.000 instead of the (correct)
0.001.
o Various bugs against math library manual pages have
been fixed.
- 1 -
5. Known_Problems_and_Workarounds
This section lists known problems with the 3.18 base
compiler portion of the IRIS Development Option.
5.1 Optimizer_(uopt(5))
o In certain cases (usually with very large subroutines),
uopt has grown unreasonably large while running (over
70 MB). This causes systems with smaller amounts of
memory to thrash and, in extreme cases, to run out of
available swap space. This should be suspected if uopt
dies with a ``signal 9,'' which means that the process
was killed externally (for example, by the operating
system), rather than by a bug that caused an internal
failure.
Almost all optimizer problems can be narrowed to to a
single subroutine. By identifying the problem
routine(s), you do not need to suppress optimization on
the whole program, only on the smaller subset.
o A considerable number of new optimizations have been
added to the assembler. These optimizations are turned
on at level -O2; if they fail, they tend to look like
optimizer problems.
5.2 Performance_Tools
The following known problems exist in pixie(1):
- Trace features are currently not supported. This
is to say that they have not been tested and thus
cannot be guaranteed to work.
- Objects loaded using dlopen() cannot be
instrumented automatically.
o The following problem exists in pixstats(1):
The DSOs must be in or linked to the current directory
when executing pixstats.
o The following problems exist in prof(1):
prof (-pixie) -testcoverage or -gprof cannot
process basic block counts for shared libraries.
If you need to process basic block counts, compile
the code with -non_shared flag.
- prof cannot process information from dynamic
shared libraries that have been opened with
- 2 -
dlopen() and have the same name, but differenct
paths, i.e.:
/path1/libl.so
/path2/libl.so
5.3 Libraries
These are known problems in compiler-associated libraries:
o In general, routines in the -lm43 library might not
conform to either SVR4 or IEEE with respect to
diagnostics or return values. These discrepancies are,
however, described in the manual pages of the
constituent functions. (See Section 3.5 for math
library changes). The following particular problems
are known (these problems exist in -lm43 routines, but
not in -lm routines):
- The -lm43 functions pow, hypot, and cabs might
fail to return NaN when given a NaN argument. The
return value in these cases is Infinity for hypot
and cabs and either Infinity or zero for pow.
- If the magnitude of their argument is greater than
one, the -lm43 functions acos and asin return
zero, pi/2, or pi rather than the (correct) NaN.
- The -lm43 y0, y1, and yn functions return NaN
(instead of -Infinity) when the argument is zero.
These functions also produce underflow
inconsistently (with respect to -lm).
- The version of gamma in the -lm43 library loops
indefinitely if it is given Infinity as an
argument.
o The single-precision version of log, logf, is
imprecise. In particular, logf(x) might not
approximate -logf(1/x) as well as expected. The
double-precision version does not exhibit this
behavior.
- 1 -
1. Dynamic_Shared_Objects
A Dynamic Shared Object, or DSO, is an ELF format object
file, very similar in structure to an executable program but
with no "main". It has a shared component, consisting of
shared text and read-only data; a private component,
consisting of data and the GOT (Global Offset Table);
several sections that hold information necessary to load and
link the object; and a liblist, the list of other shared
objects referenced by this object. Most of the libraries
supplied by SGI are available as dynamic shared objects.
A DSO is relocatable at runtime; it can be loaded at any
virtual address. A consequence of this is that all
references to external symbols must be resolved at runtime.
References from the private region (.e.g. from private data)
are resolved once at load-time; references from the shared
region (e.g. from shared text) must go through an
indirection table (GOT) and hence have a small performance
penalty associated with them.
Code compiled for use in a shared object is referred to as
Position Independent Code (PIC), whereas non-PIC is usually
referred to as non-shared. Non-shared code and PIC cannot
be mixed in the same object.
At Runtime, exec loads the main program and then loads rld,
the runtime linking loader, which finishes the exec
operation. Starting with main's liblist, rld loads each
shared object on the list, reads that object's liblist, and
repeats the operation until all shared objects have been
loaded. Next, rld allocates common and fixes up symbolic
references in each loaded object. (This is necessary
because we don't know until runtime where the object will be
loaded.) Next, each object's init code is executed.
Finally, control is transferred to "__start".
For a more complete discussion of DSOs, including answers to
questions frequently asked about them, see the dso(5) man
page.