Wednesday, September 29, 2010

LibreOffice: How Communities Have Grown Tired of Oracle

It seems that Oracle attitude towards some communities is starting to have some unwanted consequences. After dumping the OpenSolaris community with an obstinate silence and continuing to leave the Java Community Process in an embarrassing limbo, the OpenOffice community has decided to take action and announced the creation of The Document Foundation to take control of the OpenOffice productivity suite.

Meanwhile, Oracle has been requested to join the "new" community by applying to the Foundation. Other companies worried by Oracle latest moves, such as Google and Novell and Red Hat, are  already supporting the Foundation.

I wonder what Oracle will do and hope that Larry, at least once, does something so that Oracle be not the bad guy. This news makes me miss the Good Old Sun Microsystems.

The Death of OpenSolaris: Choosing an OS for a Java Developer

A Bit of History: The Struggles of OpenSolaris

This is no news: you probably know all about it.

As a long time Solaris user, the recent years have been full of good news for me.

I remember starting with GNU/Linux at home to have "something similar" to the Solaris workstations I used at work. It was the time when software would most likely compile on Solaris rather than on Linux.

Years later I bought my first Sun Workstation: it was the time when trying to compile on Solaris packages that would supposedly compile on a POSIX system was a pain. Still, I continued to regard Solaris as a stable platform to keep on using it for my work duties, such as Java programming.

Then came Solaris 10 and all of its wonderful technologies such as ZFS, Zones and DTrace, just to cite a few. With it, there came the Solaris Express distributions which, at last, filled a long standing gap between Solaris and other operating systems, providing us a pretty modern desktop environment.

In late 2008 came the first OpenSolaris distribution. I installed it, played with it, but kept on using SXCE for my workstations. The main reason was compatibility with many Sun packages, such as the Java Enterprise System or the Sun Ray Server Software, that had more than one glitch on OpenSolaris.

When SXCE was discontinued, I waited for the 2010.xx OpenSolaris release to upgrade my systems. Unfortunately, that release will never see the light.

The Oracle Leaked Memo (the dignifying uppercase is a must given Oracle prolonged silence over the subject) shed a light over Oracle plans for Solaris proper and OpenSolaris. Part of the "good news" is that the Solaris Express program has been resurrected and the first binary distribution is expected later this year.

The bad news is that the code, at least the parts of it that will be released with an open source license, will be released after the releases of the full Solaris Operating Systems. Basically, our privileged observation point over the development of the operating system has been shut down.

Lots of ink has been been spilled since the Leaked Memo and plenty of information, discussions and wars are going on in the blogosphere. I'm not an authoritative source to speak about the subject and it's not even clear to me what I'm going to do, now.

Benefits of Solaris for a Java Developer

Solaris has been my operating system of choice since before I started working in the IT industry. As a student, I grew up with Solaris at the data center of my University and the Slackware I used at home seemed like a kid toy, compared to it. After graduating, I started working as a design engineer for a leading microprocessors producer. Needless to say, Solaris was the platform we ran our design software upon. Then, I moved to a consulting firm and started my career as a Java architect.

Solaris was and is the platform of choice for most of the clients I've been working for. Even today, the application servers, the cluster software, the database, and most of the infrastructure used by my clients run on Solaris. It always seemed a sound choice to me, then, developing software on the same platform that will run it in production.

IDEs, Tools and Runtimes

As a Java developer, I can run all of my tools I need on a supported platform. My favorite IDEs (NetBeans and JDeveloper), the application servers my clients use (WebLogic and WebSphere, mostly), the databases used by my applications (MySQL, Oracle RDBMS, PostgreSQL): all of them run and are supported on Solaris. Some of them are even bundled with it or readily available by Sun sponsored package repositories. The Eclipse platform, to cite a widely use IDE for Java, is available in the OpenSolaris IPS repository, too.

Solaris Technologies

Solaris 10 also integrates DTrace, a powerful, unobtrusive framework that allows you to observe and troubleshoot application and OS problem in real time, even in production systems with an almost null overhead. DTrace has allowed us to diagnose strange production quirks with no downtime: once you've tried DTrace and the D language, there's no going back to "just" a debugger, even in the development stages of your project.

Other kinds of problems does not show up in your debugger or are extremely hard to catch. It might be the case of network or file systems problems. That's where DTrace comes in handy: it lets you observe with an incredibly high detail what's going on in your application and in the kernel of the operating systems, if it's necessary to dig so deep.

Solaris Virtualization Technologies

Solaris is also an ideal host virtualization platform. Solaris can "virtualize itself" with Containers, Zones and Logical Domains: you can start a Zone in no time (and almost no space overhead), assign a resource cap to it and also build a virtualized network in a box to simulate a complex network environment.

One of the problems that I encountered during the development of big enterprise system is that the development environment, and sometimes even the integration environment, is very different than the production one. It surely is a methodology problem: nevertheless, developers have few weapons to counteract. For example, applications that appear to run fine on a single node may not run correctly on a server cluster, or scale badly.

The more you wait to catch a bug, the more impact will have a patch for it. That's why in my development team, for example, we use Solaris Zones to simulate a network cluster of IBM WebSphere Application Servers and a DB cluster. All of them run in completely isolated Zones in one physical machine and communicate on a virtual network with virtual etherstubs (sort of a network switch), VLANs and routing rules. This environment lets us simulate exactly how the software will behave in the production system. Without a flexible and lightweight virtualization technology it would have been much more difficult and expensive to prepare a similar environment.

And if you (still) need to run other platforms, you can use Xen or VirtualBox to run, for example, your favorite Linux distro, Windows, or *BSD.


Enumerating the advantages of Solaris is difficult in such a reduced space, however I'll try:
  • It's a commercially supported operating system: that's an option, since Solaris is free for development purpose. Nonetheless, it's an important point to take into account.
  • Is (very) well documented: there's plenty of official and unofficial documentation.
  • It's easy to administer: Solaris is pretty easy to administer, even if you're not a seasoned system administrator.
  • It's an UNIX system: with all of its bells and whistles.
  • It is a great virtualization platform.
  • It has some unique technologies that add much value to its offering, such as ZFS and DTrace.

If you're a Java developer and haven't given Solaris I try, I strongly suggest you do it. Maybe you'll start to benefit from other Solaris 10 technologies such as Zones and ZFS, even for running your home file or media server.


I often hear complaints about Solaris coming from different sources and with the most imaginative arguments: proprietary, closed, old, difficult to use. I usually answer inviting users to try it and see for themselves before judging it (if that's the case). Most of the times I'm not surprised to discover that the complaining guy had minimal or null exposure to Solaris.

Also, I'd like to point out that no OS I tried is a swiss army knife. Solaris is a server-oriented OS with a good desktop but it's not comparable with other operating systems for such an use. So: no comparison with Linux, please. It would be so unjust as comparing Linux and Mac OS X for the average home user. ;)


Since Java "runs anywhere", there's plenty of choice for a Java developer.

Since I own a laptop with Mac OS X, I've built a small development environment with all of the tools I need. Mac OS X is a great operating systems that comes with many fancy features out of the box and, although it has some idiosyncrasy with Java (read: you have to use the JVM shipped by Apple), it's a good OS for a Java developer. Since the Mac OS X hype has begun, there's plenty of packages for it and a big ecosystem which is still growing. Still, many software packages run in the enterprise aren't supported on Mac OS X. Since I prefer to have an environment as close as possible as the production one, I think that OS X is not the best choice for the average Java EE architect.

I've also been an hardcore Slackware and Debian user for a long time. An enterprise Java developer would miss nothing in a modern GNU/Linux distribution, nowadays, and most of the software packages you'll find in the enterprise will run on your GNU/Linux distribution.

No need to talk about Windows, either.

So, why Solaris? Every OS has its own advantages and disadvantages. The point is to just recognize them. Mac OS X, in my opinion, is the best OS for a home user. I would change it for no Windows and no Linux. But as far as it concerns my developers' duties, every other OS just lacks the features and the stability that make Solaris great. ZFS, DTrace and Zones, for my use cases, are killer features.

What's Next?

You've decided to give Solaris a try, so: which is Your distribution? I don't know.

Solaris Express/Oracle Solaris

I strongly suspect that my wait will be prolonged and I will finally upgrade my machines as soon as Solaris Express has been released. Upgrading to Solaris 10 09/10 is not possible since I'm using some ZFS pools whose version is not yet supported by Solaris proper but it is a sound choice for a starter.

The advantage I see in using one of these versions is the availability of optional support and the good level of integration with the most commonly used software packages that Oracle is likely to guarantee.


You should also know that OpenSolaris sources have been (sort-of) forked and two new projects are born: Illumos and OpenIndiana. The project were started by Nexenta employees and volunteers of the OpenSolaris community. The first projects aims at maintaining the OpenSolaris code and the parts of the code that are closed or code that upstream might choose not to maintain. The OpenIndiana project aims at producing binary distribution of the operating system built upon the Illumos source code. OpenIndiana will provide a really open source, binary compatible alternative to Solaris and Solaris Express.

Sounds good and I'll willingly support it. In the meantime I've installed OpenIndiana in a couple of virtual machines and the first impressions are very good. I suppose it hasn't passed enough time yet for diverging changes to have emerged.

If you prefer a more modern desktop with a recent Gnome interface, drop Solaris 10 and go for OpenIndiana, if you don't feel like waiting for Solaris Express. In any case, switching between the two shouldn't pose any problems. What's clear to me is that I won't consider using both operating systems: I'll have to make a choice.

Support Availability

As an enterprise user and a Java developer, I've always been more concerned about OS support and support for the packages I use, rather than about eye candy. Even at the cost of running a proprietary platform.

In conclusion: I'll wait for Solaris Express to be released and only then will decide which one I'll use for my purposes between Oracle Solaris Express and  OpenIndiana. My heart is betting for OpenIndiana. My brain is betting for Oracle Solaris Express and Solaris proper. Only time will tell which one is right (for me.)


A follow-up of this blog post is avaible at this address. In this post I'll try to summarize some use cases in which the technology we introduced in this post are effective and add real value to your development duties.

I hope you enjoy it.

Tuesday, September 28, 2010

Encoding an Unicode Character to UTF-8


I'm living in an UTF-8 world. Everthing, from the operating systems I use (Solaris, Linux and Mac OS X) to my terminals. Even the file systems I use must be able to support the UTF-8 names I give to my files.

UTF-8 is a flexible and well supported encoding for Unicode and it's supported, out of the box, on all the operating systems I use. UTF-8 allows me not to worry (ever) about the characters I use in file names or in the files I write. Being a native Italian living in Spain, it's essential for me to have an encoding that supports all of the languages I use in my daily file.

UTF-8 is the perfect solution: it's ASCII backwards compatible, so I need no conversion when writing ASCII files, and can encode every characters in the Unicode character set. No need to worry if one day I had to write some Japanese file name. The only "drawback" it may have it's a little space overhead compared to some other encodings (such as UTF-16) but in this particular case, UTF-8 beats them all.

In such an UTF-8 world, there's no need to worry: just use the tools you have and everything will be fine.

Well, almost. Sometimes, as described in that post, I need to know exactly how UTF-8 encodes a specific Unicode code point. Some other times, since I'm a sed and grep addict, it's really handy knowing what to look for. Some days ago, for example, I had look for file names that contained specific code points to correct an idiosyncrasy between Subversion and Mac OS X (I still cannot believe it...). In such a cases, an UNIX terminal and its utilities are really your best friends. Since the encoding process is very easy, I'm quickly describing it to you in the case you need it.

The UTF-8 Encoding

The UTF-8 encoding is a variable width encoding in which each Unicode code point can be encoded as a sequence of 1 to 4 octects. Each octect is composed by a heading byte and trailing bytes. Since the encoding is a variable width one, you need a way to tell where a sequence starts and where it ends. That information is stored in the head byte.

Heading Byte

The head byte can take one of these forms:
  • 0xxxxxxx: Single byte.
  • 110xxxxx: Head of a sequence of two bytes.
  • 1110xxxx: Head of a sequence of three bytes.
  • 11110xxx: Head of a sequence of four bytes.

You may have noticed that the number of 1s in a heading byte tells how long the sequence is. If the heading byte starts with 0, the sequence is single byte.

The 0xxxxxxx format guarantees that ASCII characters in the [0, 127] range are encoded by the identical function thus guaranteeing the backwards compatibility with ASCII.

Trailing Bytes

Trailing bytes in a multibyte UTF-8 sequence always have the following form:
  • 10xxxxxx


The 0s and 1s in the heading and trailing bytes format described in the previous sections are called control bits. In the UTF-8 encoding, the concatenation of the non control bits is the scalar value of the encoded Unicode code point. Because of the structure of the aforementioned forms, the number of required bytes to encode a specific Unicode code point with UTF-8 depends on the Unicode range where it falls (the ranges are given both hexadecimal and binary forms:
  • [U+000, U+007F] [00000000, 01111111]: one byte.
  • [U+0080, U+07FF] [10000000, 111 11111111]: two bytes.
  • [U+0800, U+FFFF] [1000 00000000, 11111111 11111111]: three bytes.
  • [U+10000, U+10FFFF] [1 00000000 00000000, 10000 00000000 00000000]: four bytes.

One Byte Range

In the one byte range, meaningful bits of the Unicode code points are in then [0, 6] range. Since the single byte form has a 1 bit overhead (the leading 0), a single byte is sufficient:

[0xxxxxxx  0xxxxxxxx]
[00000000, 011111111]

Two Bytes Range

In the two bytes range, meaningful bits of the Unicode code points are in the [0, 10] range. Since the head of a sequence of two bytes has a 3 bits overhead and the form of a trailing byte has a 2 bits overhead, there's room for exactly 11 bits.

[     fff ffssssss,      fff ffssssss]
[00000000 10000000, 00000111 11111111]

(* f are bits from the first byte of the encoded sequence)
(** s are bits from the second byte of encoded the sequence)

The encoded sequence is:

[110fffff 10ssssss]

Three Bytes Range

In the three bytes range, meaningful bits of the Unicode code points are in the [0, 15] range. Since the head of a sequence of three bytes has a 4 bits overhead and the form of a trailing byte has a 2 bits overhead, there's room for exactly 16 bits.

[ffffssss sstttttt, ffffssss sstttttt]
[00001000 00000000, 11111111 11111111]

(* f are bits from the first byte of the sequence)
(** s are bits from the second byte of the sequence)
(*** t are bits from the third byte of the sequence)

The encoded sequence is:

[1110ffff 10ssssss 10tttttt]

Four Bytes Range

In the four bytes range, meaningful bits of the Unicode code points are in the [0, 20] range. Since the head of a sequence of four bytes has a 5 bits overhead and the form of a trailing byte has a 2 bits overhead, there's room for exactly 21 bits.

[   fffss sssstttt tthhhhhh,    fffss sssstttt tthhhhhh]
[00000001 00000000 00000000, 00010000 11111111 11111111]

(* f are bits from the first byte of the sequence)
(** s are bits from the second byte of the sequence)
(*** t are bits from the third byte of the sequence)
(**** h are bits from the third byte of the sequence)

The encoded sequence is:

[11110fff 10ssssss 10tttttt 10hhhhhh]


You know see how easy is to convert an Unicode code point to its representation in the UTF-8 encoding. The UTF-8 encoding of the Byte Order Mark (BOM) character, for example, whose code point is U+FEFF, can be easily be computed as explained above.

The U+FEFF code point falls in the three byte range and its binary representation is:

[F    E    F    F   ]
[1111 1110 1111 1111]

According the the aforementioned rule, the conversion gives:

[E   F    B   B    B   F   ]
[11101111 10111011 10111111]

that corresponds to the \xEF\xBB\xBF expression used with sed in our previous example.

EJB 3.1 Global JNDI Access

Table of Contents

As outlined in the previous parts of this series, the major drawback of the EJB v. 3.0 Specification was the lack of portable global JNDI names. This implies that there's no portable way to link EJB references to a bean outside your application.

The EJB v. 3.1 Specification fills this gap defining, in its own words:

"a standardized global JNDI namespace and a series of related namespaces that map to the various scopes of a Java EE application."

This blog post will give you an overview of the Global JNDI Access as defined by the EJB v. 3.1 Specification.

Namespaces and Scopes

The EJB v. 3.1 Specification defines three distinct namespaces with its own scopes:
  • Global.
  • Application.
  • Module.

The specification requires compliant containers to register all of the session beans with the required JNDI names. Such standardized names are portable and your application components will be able to establish a reference to an EJB using a name that is portable across application servers.


Names in the global namespace will be accessible to code in any application and conform to the following syntax:


<app-name> is the name of the Java EE application as specified in its standard deployment descriptor (application.xml) or, by default, the base name of the deployed EAR archive. This path fragment is used only if the session EJB is deployed within a Java EE application EAR file.

If the session EJB is deployed in an EAR file, its <module-name> is the path name of the Java EE module that contains the bean (without extension) inside the EAR file. If the session bean is deployed as a standalone Java EE component in a JAR file or as part of a Java EE web module in a WAR file (which is now allowed by the Java EE 6 Specification), the <module-name> is the name of the archive (without extension). The <module-name> value can be overridden by the <module-name/> element of the component's standard deployment descriptor (ejb-jar.xml or web.xml).

The <bean-name> is the EJB name as specified by the mechanisms described in the previous parts of this blog post.

The <interface-FQN> part is the fully qualified name of the EJB business interface.

The container has to register one JNDI global entry for every local and remote business interface implemented by the EJB and its no-interface view.

Session EJB With One Business Interface or a No-Interface View

If an EJB implements only one business interface or only has a no-interface view, the container is also required to register such a view with the following JNDI name:



Names in the application namespace will be accessible only to code within the same application and conform to the following syntax:


Each path fragment retains the same meaning described for the global namespace JNDI names syntax in the previous section.

The same publishing rules for a compliant container described in the previous section apply.


Names in the module namespace will be accessible only to code within the same module and conform to the following syntax:


Once more, each path fragment retains the same meaning described for the global namespace JNDI names.

The same publishing rules for a compliant container described in the global namespace section.

Local Clients

It may be important to notice that, although global JNDI names for local interfaces (and no-interface views) are published, this does not imply that such an interface will be accessible to components running in another JVM.


The EJB v. 3.1 Specification, and other specifications in the Java EE 6 Platform, brings simplicity and adds many new features and tools to the developers' toolboxes. "Global JNDI names" is an outstanding, although simple, feature because it finally fills a long-standing portability limitation of the previous revisions of this specification and, hence, of the entire Java EE platform.

EJB 3.0 and EJB 3.1 provide a powerful, portable, yet simple component model to build enterprise applications. The "EJB sucks" day are gone but only time will tell if this technology will regain the trust of Us, The Developers.

As far as it concerns myself, I feel pretty comfortable with Java EE 6, EJBs, CDI beans, the good support I have from IDEs such as NetBeans or JDeveloper (although the latter does not support EJB 3.1 yet), and all the specifications that build up this venerable platform.

A Shell Script to Find and Remove the BOM Marker

As pointed out by Omri, the script is failing on OS X apparently because of an idiosyncrasy in Apple's sed implementation. I temporarily fixed the script switching from sed to perl on OS X: perl is also shipped by default on OS X so there shouldn't be any problem. However, on OS X this version of the script scans by default the entire file, and not only the first line as it does with other sed implementations.


Have you ever seen this characters while dumping the contents of some of your text files, haven't you?

If you have, you found a BOM marker! The BOM marker is a Unicode character with code point U+FEFF that specifies the endianness of an Unicode text stream.

Since Unicode characters can be encoded as a multibyte sequence with a specific endianness, and since different architectures may adopt distinct endianness types, it's fundamental to signal the receiver about the endianness of the data stream being sent. Dealing with the BOM, then, it's part of the game.

If you want to know more about when to use the BOM you can start by reading this official Unicode FAQ.

This post has been modified to solve some problems and improve the script according to your comments:

  • Solved some mktemp inconsistencies across UNIX flavours (such as Solaris and Mac OS X).
  • Files can now be filtered by extension using the -e option, as suggested by Goldan.
  • BOM can be removed throughout the file using the -a option, as suggested by Goldan.
  • An arbitrary number of files can be safely passed as a parameter.
  • The script behaves correctly even with filenames with whitespaces in it.
Safe Harbour Statements: I try to test the script in the greatest number of systems but I'm not guaranteeing that it is working correctly on your. I'll be glad if you give me your feedback: any suggestion or bug report will be appreciated.


UTF-8 is one of the most widely used Unicode characters encoding on software and protocols that have to deal with textual data stream. UTF-8 represents each Unicode character with a sequence of 1 to 4 octects. Each octect contains control bits that are used to identify the beginning and the length of an octect sequence. The Unicode code point is simply the concatenation of the non control bits in the sequence. One of the advantages of UTF-8 is that it retains backwards compatibility with ASCII in the ASCII [0-127] range since such characters are represented with the same octect in both encodings.

If you feel curious about how the UTF-8 encoding works, I've written an introductory post about it.

Common Problems
Because of its design, the UTF-8 encoding is not endianness-sensible and using the BOM with this encoding is discouraged by the Unicode standard. Unfortunately some common utilities, notably Microsoft Notepad, keep on adding a BOM in your UTF-8 files thus breaking those application that aren't prepared to deal with it.

Some programs could, for example, display the following characters at the beginning of your file:

A more serious problem is that a BOM will break a UNIX shell script interfering with the shebang (#!).

A Shell Scripts to Check for BOMs and Remove Them

The Byte Order Mark (BOM) is a Unicode character with code point U+FEFF. Its UTF-8 representation is the following sequence of 3 octects:

1110 1111 1011 1011 1011 1111
E    F    B    B    B    F

The quickest way I know of to process a text file and perform this operation is sed. The following syntax will instruct sed to remove the BOM from the first line of its input file:

sed '1 s/\xEF\xBB\xBF//' < input > output

A Warning for Solaris Users
I haven't found a way (yet) to correctly use a sed implementation bundled with Solaris 10 to perform this operation, neither using /usr/bin/sed nor /usr/xpg4/bin/sed. If you're a Solaris user, please consider installing GNU sed to use the following script.

The quickest way to install sed and a lot of fancy Solaris packages is using Blastwave or OpenCSW. I've also written a post about loopback-mounting Blastwave/OpenCSW installation directory in Solaris Zones to simplify Blastwave/OpenCSW software administration.

A Suggestion for Windows Users
If you want to execute this script in a Windows environment, you can install CygWin. The base install with bash and the core utilities will be sufficient for this script to work on your CygWin environment.

This is the source code of a skeleton implementation of a bash shell script that will remove the BOM from its input files. The script support recursive scanning of directories to "clean" an entire file system tree and a flag (-x) to avoid descending in a filesystem mounted elsewhere. The script uses temporary files while doing the conversion and the original file will be overwritten only if the -d option is not specified.


set -o nounset
set -o errexit


if [ $(uname) == "SunOS" ] ; then
  if [ -x /usr/gnu/bin/sed ] ; then
    echo "Using GNU sed..."
  TMP_OPTS="-p "

if [ $(uname) == "Darwin" ] ; then
  TMP_OPTS="-t tmp"

  SED_EXEC="perl -pe"
  echo "Using perl..."


function usage() {
  echo "bom-remove [-adrx] [-s sed-name] [-e ext] files..."
  echo ""
  echo "  -a    Remove the BOM throughout the entire file."
  echo "  -e    Look only for files with the chosen extensions."
  echo "  -d    Do not overwrite original files and do not remove temp files."
  echo "  -r    Scan subdirectories."
  echo "  -s    Specify an alternate sed implementation."
  echo "  -x    Don't descend directories in other filesystems."

function checkExecutable() {
  if ( ! which "$1" > /dev/null 2>&1 ); then
    echo "Cannot find executable:" $1
    exit 4

function parseArgs() {
  while getopts "adfrs:e:x" flag
    case $flag in
      r) RECURSIVE=true ;;
      f) PROCESSING_FILES=true ;;
      s) SED_EXEC=$OPTARG ;;
      e) USE_EXT=true ; FILE_EXT=$OPTARG ;;
      d) DELETE_ORIG=false ; DELETE_FLAG="-d" ;;
      x) XDEV="-xdev" ;;
      *) echo "Unknown parameter." ; usage ; exit 2 ;; 

  shift $(($OPTIND - 1))

  if [ $# == 0 ] ; then
    exit 2;

  # fixing darwin
  if [[ $ISDARWIN == true && $PROCESSALLFILE == false ]] ; then
    echo "Process all file is implicitly set on Darwin."


  if [ ! -n "$FILES" ]; then
    echo "No files specified. Exiting."

  if [ $RECURSIVE == true ]  && [ $PROCESSING_FILES == true ] ; then
    echo "Cannot use -r and -f at the same time."
    exit 1

  checkExecutable $SED_EXEC
  checkExecutable $TMP_CMD

function processFile() {
  if [ $(uname) == "Darwin" ] ; then
    TEMPFILENAME=$($TMP_CMD $TMP_OPTS"$(dirname "$1")")
  echo "Processing $1 using temp file $TEMPFILENAME"

  if [ $PROCESSALLFILE == false ] ; then 
    cat "$1" | $SED_EXEC '1 s/\xEF\xBB\xBF//' > "$TEMPFILENAME"
    cat "$1" | $SED_EXEC 's/\xEF\xBB\xBF//g' > "$TEMPFILENAME"

  if [ $DELETE_ORIG == true ] ; then
    if [ ! -w "$1" ] ; then
      echo "$1 is not writable. Leaving tempfile."
      echo "Removing temp file..."
      mv "$TEMPFILENAME" "$1"

function doJob() {
  # Check if the script has been called from the outside.
  if [ $PROCESSING_FILES == true ] ; then
    for i in $(seq 1 ${#FILES[@]})
      echo ${FILES[$i-1]}
      processFile "${FILES[$i-1]}"

    # processing every file
for i in $(seq 1 ${#FILES[@]})
      # checking if file or directory exist
      if [ ! -e "$CURRFILE" ] ; then echo "File not found: $CURRFILE. Skipping..." ; continue ; fi
      # if a paremeter is a directory, process it recursively if RECURSIVE is set
      if [ -d "$CURRFILE" ] ; then
        if [ $RECURSIVE == true ] ; then
          if [ $USE_EXT == true ] ; then
            find "$CURRFILE" $XDEV -type f -name "*.$FILE_EXT" -exec "$0" $DELETE_FLAG $PROCESSALLFILE_FLAG -f "{}" \;
            find "$CURRFILE" $XDEV -type f -exec "$0" $DELETE_FLAG $PROCESSALLFILE_FLAG -f "{}" \;
          echo "$CURRFILE is a directory. Skipping..."
        processFile "$CURRFILE"

parseArgs "$@"

Assuming the script is in your $PATH and it's called bom-remove, you can "clean" a bunch of files invoking it this way:

$ bom-remove file-to-clean ...

If you want to clean the files in an entire directory, you can use the following syntax:

$ bom-remove -r dir-to-clean ...

If your sed installation is not in your $PATH or you have to use an alternate version, you can invoke the script with the following syntax:

$ bom-remove -s path/to/sed file-to-clean ...

If you want to clean a directory in which other file systems might be mounted, you can use the -x option so that the script does not descend them:

$ bom-remove -xr dir-to-clean ...

Next Steps

The most effective way to fight the BOM is avoiding spreading it. Microsoft Notepad, if there's anybody out there using it, isn't the best tool to edit your UTF-8 files so, please, avoid it.

However, should your file system be affected by the BOM-desease, I hope this script will be a good starting point to build a BOM-cleaning solution for your site.


Sunday, September 26, 2010

References to EJBs Outside Your Application With Oracle WebLogic

Table of Contents

In the previous posts we made an overview of the EJB v. 3.0 and of the portable mechanisms it provides you to build your Java EE application. Since Java EE specifications are all about portability, at the risk of repeating ourselves we've often stressed the most important portability limit still present on the EJB v. 3.0 specifications: there is no portable way to declare and link a reference to an EJB outside your application.

Although there exist other standards (such as Web Services) that let you loosely couple components of your applications, remote EJBs are still an ideal mean to accomplish this task because of their simplicity, their standardization, the good development support of many IDEs and the good performance they provide.

In this blog post we'll make an overview of the mechanisms provided by one of the leading Java EE application servers, Oracle WebLogic, to give support to references to EJBs outside the scope of your application.

mappedName in Oracle WebLogic

Java EE compliant application servers provide additional non-portable API, tools and mechanisms used to enhance the standard Java EE features they implement. One of the features we've mentioned in the first part of this series was the mappedName @EJB element.

Although the EJB v. 3.0 Specification is willingly unclear about this element (that was sort of replaced by the lookup element introduced by the EJB v. 3.1 Specification), many application server vendors have implemented it with the intuitive behavior suggested by its own name: mapping an EJB to a global JNDI name.

If you want to trade portability for simplicity, many application servers (such as Oracle WebLogic or GlassFish) will let you define a bean's global JNDI name with the value of the mappedName element (or its corresponding deployment descriptor element.) As already stated, beware that Oracle WebLogic will assign global JNDI names to remote business interfaces only. This isn't really a limitation since local business interfaces can always be referenced using the APIs described in the previous parts of this series.

Oracle WebLogic Naming Conventions for EJB Remote Business Interfaces

With such a mechanism in place, linking a reference to a bean outside your application is straightforward. JDeveloper's EJB wizard, indeed, will put a default mappedName for you with an intuitive naming scheme that resembles somehow the new portable global JNDI names introduced by the EJB v. 3.1 specification, as shown in the following screenshot:

The naming scheme suggested by JDeveloper is the following:

mappedName = [application-name]-[module-name]-[bean-name]

If adopted, this naming scheme provides an easy way to assign every bean an unique name throughout your applications. I recognize that such names are a bit awkward but, being non-portable, is a naming scheme as good as any other.

Global JNDI Names for Remote EJB Interfaces in Oracle WebLogic

The global JNDI names of the remote business interfaces of an EJB with a mappedName in Oracle WebLogic Application Server will be:


Using this naming scheme will allow you to build loosely coupled Java EE application that reuse each other EJBs. If you want to inspect your server JNDI tree and check the actual names of your deployed EJB, you can use WebLogic's JNDI Tree inspector, which can be launched opening the WebLogic console, navigating to the Environment/Servers/[your-server] page and using the View JNDI Tree link. In the following screenshot you can examine the global JNDI entry for a bean defined as follows:

package es.reacts;

import ...;

@Stateless(name = "EJBByMappedName", mappedName = "Application1-EjbTest0-EJBByMappedName")
public class EJBByMappedNameBean implements RemoteByMappedName {

In the screenshot you can appreciate the entry corresponding to the es.reacts.RemoteByMappedName business interface.

Customizing the JNDI Name of an EJB Remote Interface

Oracle WebLogic provides you the necessary tools to customize and override its default naming conventions for EJB remote interfaces. To assign or override the global JNDI name of an EJB remote interface you can use the WebLogic specific deployment descriptors. In the case of an EJB module, for example, you can use JDeveloper to quickly add a default weblogic-ejb-jar.xml or, if using another IDE such as NetBeans, you can create a new XML file named weblogic-ejb-jar.xml in the META-INF directory of your module. An empty weblogic-ejb-jar.xml file looks like this (as of Oracle WebLogic 10.3):

<?xml version = '1.0' encoding = 'UTF-8'?>
<weblogic-ejb-jar xmlns:xsi=""

To assign or override a global JNDI name for a given EJB remote interface you can use the following elements:

<?xml version = '1.0' encoding = 'UTF-8'?>
<weblogic-ejb-jar xmlns:xsi=""

If you're using JDeveloper, it provides you an easy GUI to edit the weblogic-ejb-jar.xml file:

The JDeveloper GUI will let you easily customize the WebLogic deployment descriptor and configure other non-portable features of the WebLogic Application Server such as EJB clustering.

Linking an EJB Reference to a Global JNDI Name

In the previous section we made an overview of the tools that Oracle WebLogic Application Server provides to customize the execution environment and establish a global JNDI name for a remote interface of an EJB. In the same way, Oracle WebLogic provides you other tools to link an EJB reference to a specific target EJB using a global JNDI name.

In the examples seen so far, we've always linked an EJB reference to a target using the portable mechanisms provided by the EJB v. 3.0 Specification. If you need to establish a target to a remote EJB outside the scope of your application, you can use the WebLogic-specific deployment descriptor of the module that acts as the EJB client. In the case of the Java EE module we've used so far, you can use JDeveloper to add the WebLogic-specific deployment descriptor called weblogic.xml. If you're using other IDEs, the skeleton of this file is the following (as of Oracle WebLogic 10.3):

<?xml version = '1.0' encoding = 'UTF-8'?>
<weblogic-web-app xmlns:xsi=""

This weblogic.xml deployment descriptor links the EJB reference ejb/myGlobalRef with the object stored with the global JNDI name global-jndi-name, that is the name we've specified in the weblogic-ejb-jar.xml file described in the previous section. This reference will be declared as usual with a @EJB annotation or in the standard deployment descriptor (in this case, the web.xml file).

If you're using JDeveloper, a GUI will help you build the WebLogic-specific deployment descriptor. The GUI is well done and is also able to scan you module's EJB references and let them choose from a list when customizing them.

Next Steps

In this part we've learnt how to use WebLogic-specific tools in the case we had to link an EJB reference to an EJB that lives outsides our application using a global JNDI name. In the following part (coming soon) we'll see how the EJB v. 3.1 Specification has (finally) filled this gap defining portable JNDI names. With EJB v. 3.1 you'll be able to build modular Java EE application that reuse components of one another without the need to rely on non-portable mechanisms any longer.

EJB Programmatic Lookup

Table of Contents

In our previous post we learnt about EJB references and EJB injection. Even though EJB injection is a powerful container facility that eases modular application development, sometimes performing a programmatic EJB lookup is instead desirable.

Let's suppose, for example, that a set of distinct EJBs implement a common strategy, defined by a common business interface. Depending on the result of some choice algorithm (such as a business rule), a different strategy is chosen and hence a different EJB will be executed in the scope of a business process. In such a scenario the target EJB cannot be chosen at injection time since annotation elements (such as @EJB's) are defined at compilation time and deployment descriptors are defined at deployment time. The only solution to this problem is a programmatic JNDI lookup.

The same mechanisms described in the previous posts will apply. EJB references will be declared and linked against a name in your application private namespace using the @EJB annotation or the corresponding elements of your Java EE module deployment descriptor.

Lookup in the Application Private Namespace

The portable way to establish an indirection level between the namespace used in your lookup code and the target EJBs is using your application private namespace. This kind of indirection level is quite common in the Java EE platform: not only it is used for EJB references but for all sorts of resource references such as JDBC data sources, JMS queues, JavaMail sessions, etc.

In the case of EJBs, as seen in our previous post, you simply define a private name used by lookup and injection code of your application. This is a private application-scoped name and is a subelement of the java:comp/env JNDI entry. With the aid of the @EJB annotation and of the deployment descriptors, you can establish a link between this name and a target EJB. The only difference is that instead of relying on the container to inject a reference into a component of yours, your application algorithms will choose the appropriate EJB and will look it up dynamically.

As we've seen in part 2 of this series, the @EJB annotation can be used at type, method and field level to declare a reference to an EJB and, optionally, to link it to the target bean without the need of writing any deployment descriptor code.

If the case of dynamic programmatic JNDI lookup, instead of annotating a field (or a property) as an injection target, you can annotate a class (such as a Servlet) to establish a reference to an EJB. In the following example we'll see how to do it with both the @EJB annotation and the deployment descriptor.

Declaring a Reference to an EJB

In the test servlet we've used throughout our previous posts, we can use the @EJB annotation at class level to declare a reference to an EJB in the private name ejb/ejbLocalRef:

@EJB(name = "ejb/ejbLocalRef",
     beanInterface = es.reacts.SessionTest0Local.class,
     beanName = "EJBServer1.jar#SessionTest1")
public class ServletTest1 extends HttpServlet {

The annotation in the previous example is functionally equivalent to the following deployment descriptor (in this case, the web.xml file) fragment:


The most important difference between the @EJB semantics in this example and in the examples in the previous post is that in this case we're providing all of the information required to establish the reference and the link to the target EJB without injecting nor even relying on information coming from the injection target (such as the beanInterface).

Although the annotation is applied at a class level, it is effectively equivalent to adding the corresponding deployment descriptor elements and, therefore, the declared reference will be available throughout your Java EE module. In this case, any other servlet in your Java EE web module will be able to inject or lookup the very same EJB referenced by the ejb/ejbLocalRef name:

@EJB(name = "ejb/ejbLocalRef")
SessionTest0Local lc4;

Additional "plumbing" here is not necessary since the reference declaration contains all of the information that is required to resolve the target EJB.

EJB Programmatic Lookup

Since a reference has been declared and linked, our code is now able to make a JNDI lookup and retrieve a reference to the desired business interface of our target EJB. The JNDI lookup code is the good ole lookup code we're used to (with a little difference we'll point out later):

InitialContext ctx = new InitialContext();
Object obj = ctx.lookup("java:comp/env/ejb/ejbLocalRef");
if (obj instanceof SessionTest0Local) {
  SessionTest0Local lc = (SessionTest0Local) obj;

(* Please note that the previous fragment has been stripped of the required exception handling code.)

The good news with EJB 3.0 is that you don't need to narrow the reference using the PortableRemoteObject.narrow() method as was required by the EJB v. 2.1 Specification. In the example code we can directly test the reference with the instanceof operator and use a Java native cast to set a SessionTest0Local reference.

There's absolutely no difference between looking up local and remote business interfaces. Only in the case you rely on the deployment descriptor, the declaration and the linking of the EJB references will be performed using an <ejb-ref/> or an <ejb-local-ref/> according to the EJB business interface type. As far as it concerns your application, the lookup code will be identical.


In scenarios in which you don't use EJB injection and rely on lookup instead, there are both advantages and disadvantages in using annotations or the deployment descriptors to declare and link EJB references.

The advantages of annotations is that they're easier to write and use than the corresponding deployment descriptors elements. Moreover, as far as it concerns my experience, IDE support for code autocompletion might be a better tool than some deployment descriptor editors which are "obscure", at best (with notable exceptions such as Oracle JDeveloper's and NetBeans'.)

The advantage of the deployment descriptor is that it can centralize resource references declarations. If the same EJB reference is used throughout the code of your Java EE module and is not confined to a single class, a best option is using the deployment descriptor to declare and link the EJB (and other resource) reference and avoid using annotations at all. This is a design choice that has to be taken carefully. In the uses cases in which a lookup is desirable, chances are that EJB linking might be performed in the application assembly and deployment stages. Having references well documented and declared in a central repository might still be preferable instead of having @EJB annotation scattered throughout your code and the task of the deployer might get considerably eased.

Next Steps

In the following part we'll see how application server-specific mechanisms will help us build modular Java EE applications and link EJB references to EJB remote business interfaces outside our application. We'll explore the tools provided by a leading Java EE compliant application server: Oracle WebLogic.

Thursday, September 23, 2010

Basic EJB References, Injection and Lookup

Table of Contents

In the first part of this series we've introduced the mechanisms provided by the Enterprise JavaBeans v. 3.0 Specification to define EJB components, declare a reference to an EJB and wiring them up both by dependency injection or programmatic JNDI lookup.

In this blog post we'll examine some basic examples to understand how to use the EJB API.

A Basic EJB

An EJB is basically a POJO with some extra EJB metadata. The required metadata to deploy it as an EJB component can be provided both by using the EJB annotations or by the standard deployment descriptor. The following class implements a very basic stateless session EJB:

package es.reacts;

import javax.ejb.Stateless;

@Stateless(name = "UniqueLocalSessionEJB")
public class UniqueLocalSessionEJBBean implements UniqueLocalBusinessInterface {
    public UniqueLocalSessionEJBBean() {
    public String sayLocalHello() {
      return this.getClass().getName() + "::" + "Local hello.";

As you may recollect from our previous blog post, the @Stateless annotation is used to define a stateless session bean. The optional name element is used to define the session bean name. This element is analogous to the <ejb-name/> element of the standard deployment descriptor. This element defaults to the unqualified name of the bean class (UniqueLocalSessionEJBBean in the example above), and the example above uses it to rename the bean to UniqueLocalSessionEJB.

Since we're using the @Stateless annotation, there's no further need to declare the EJB in the deployment descriptor.

In this example, we're assuming that the EJB is packaged in an EJB module that depends on the module containing the definition of its business interface (as explained in the following section.)

Business interfaces

Every EJB implements one or more business interfaces. Business interfaces can be local or remote. The most important differences between the two types of business interfaces can be summarized as follows:
  • Local business interfaces use pass-by-reference semantics for their methods and method invocations cannot cross the JVM boundary. Local business interfaces are available only to callers in the same application and JVM instance of the callee.
  • Remote business interfaces use pass-by-value semantics for their methods and method invocations can cross the JVM boundary. Remote business interfaces are available to callers outside the application of the callee.

In the previous example, the business interface UniqueLocalBusinessInterface is declared as follows:

package es.reacts;

import javax.ejb.Local;

public interface UniqueLocalBusinessInterface {
    String sayLocalHello();

In the EJB v. 3.0 world a business interface is just a plain old Java interface annotated with either the @Local or @Remote annotation.

Packaging Business Interfaces

In this example we're assuming that the EJB business interface is packaged in a JAR file that the EJB module depends on. Since EJB clients only depend on an EJB business interface, it's a good practice to package the business interfaces in a separate library in order to ease the interface distribution and to decouple them from their implementations.

Injecting an EJB into a Java Servlet

Now that we've defined an EJB, we're ready to use it from a servlet in a Java EE web module. Assuming that there's only one EJB implemented the UniqueLocalBusinessInterface in our application, we can inject it using an empty @EJB annotation:

package es.reacts;


import javax.ejb.EJB;

import javax.servlet.*;
import javax.servlet.http.*;

public class ServletTest1 extends HttpServlet {
  UniqueLocalBusinessInterface lc;

  public void doGet(HttpServletRequest request,
    HttpServletResponse response)
    throws ServletException, IOException {

The first thing to note is that the EJB is injected into our servlet by the application server since the bean interface alone is sufficient to identify the target EJB. In this case the beanInterface element of the @EJB annotation takes on its default value, as explained in our previous post, that is the type of the injected field: UniqueLocalBusinessInterface. Since there's just one EJB in the application that implements this business interface, the lc field of the servlet is injected with a reference to an instance of such a class.

The second thing worth pointing out is that we're injecting an EJB into a servlet field safely because the EJB is stateless. Since servlet are stateless by default, you should not inject stateful resources into servlet fields properties otherwise you may run into concurrency-related problems. If you needed to use a stateful EJB into a servlet, you should retrieve a reference by programmatic JNDI lookup since that will guarantee that a new instance is returned for every lookup operation.

Let's deploy and run our application and we'll see that the servlet is injected its target EJB and the method invocation to the sayLocalHello() method of its business interface is carried out correctly.

If we wanted to inject a reference to a remote interface the client code would not be affected. If you try and change the UniqueLocalBusinessInterface from @Local to @Remote, you'll see that the servlet sees no change and continues to work correctly.

What Happens If More Than One EJB Implements the Same Interface?

Let's suppose that the we add another EJB in our EJB module in this application that implements the same interface as the previous one, UniqueLocalBusinessInterface. In this case, since the bean interface is not sufficient any longer to determine the target bean for injection, you'll be returned an error. Deploying such an application in the WebLogic Application Server, for example, results in the following error being thrown:

[08:46:25 PM] Caused by: weblogic.deployment.EnvironmentException: [J2EE:160199]Error resolving ejb-ref 'es.reacts.ServletTest1/lc1' from module 'WebTest0' of application 'EJBTestApp'. The ejb-ref does not have an ejb-link and the JNDI name of the target bean has not been specified. Attempts to automatically link the ejb-ref to its target bean failed because multiple EJBs in the application were found to implement the 'es.reacts.UniqueLocalBusinessInterface' interface. Please specify a qualified ejb-link for this ejb-ref to indicate which EJB is the target of this ejb-ref.

Injecting a Reference to a Specific EJB Instance

To solve the problem occurred in the previous section we need to provide the application server with the required information to identify the target EJB. As explained in our previous post, we can use the following two methods:
  • Either we use the name element of the @EJB annotation (or the corresponding <ejb-ref-name/> element of the deployment descriptor) to declare an EJB reference in the private namespace of the application and link it to the target bean using the deployment descriptor.
  • Or we use the beanName element of the @EJB annotation (or the corresponding <ejb-link/> element of the deployment descriptor) to do it directly in our code.

Mapping an EJB into the Private Namespace

Using the first method we'll end up with the following code in our servlet:

@EJB(name = "ejb/bean-name")
UniqueLocalBusinessInterface lc;

and the following element in the deployment descriptor (web.xml) of our Java EE web module that acts as an EJB client:


The <ejb-link/> element contains the bean name that we defined at the beginning of our example with the annotation:

@Stateless(name = "UniqueLocalSessionEJB")

in the EJB implementation class.

Please note that, in this example, we used the @EJB name element explicitely but we could have established the link using its default value. The default value of the name element is:

[qualified class name]/[property or field name]

that, in this case, would be:


The disadvantage of using the default auto-generated name together with EJB linking using <ejb-link/> is that every time you refactor your code you'll have to check the deployment descriptors. Although developers sometimes think otherwise, the Java EE Specification defines some other roles such as the assambler and the deployer. In large corporate environments, it's not uncommon for such profiles to override developers' annotations to "plumb" the components used by the applications. Annotation override is one of the reasons why standard deployment descriptors still exist. This is especially true when references are to remote components, inside or outside your application. For this reason I suggest you do not rely on auto-generated names and use custom well documented names instead.

Linking an EJB to a Reference in the Private Namespace

The second method provides a direct way to link the reference to its target bean using the beanName element of the @EJB annotation. The servlet code will use the following EJB reference:

@EJB(beanName = "UniqueLocalSessionEJB")
UniqueLocalBusinessInterface lc;

and we need no additional information in the deployment descriptor.

Although this method allows the developer to link a reference to an EJB without relying on the deployment descriptor, the suggestion given at the end of the previous section is still valid. Remember that annotations can be overridden at deployment time! Do not link an EJB to a reference if you know beforehand that such a reference is eligible for override. In such cases, prefer assigning a name to the reference instead, as explained in the previous section.

Next Steps

In the following post we'll make a quick overview of how the EJB 3.0 API can be used to perform EJB programmatic lookups.