Pythian Blog: Technical Track

EBS 12.2 issues seen in the wild

Recently, I was involved in an EBS upgrade project, and also in operational support of an EBS production environment. Based on this project, here is an unsorted list of issues I've seen in or around the EBS core.

EBS architecture

EBS, or Oracle Applications was designed around Oracle database and is a very old product (it was first released when I was in kindergarten). Due to its age, you would expect that the complexity of the product is high. And it really is. Most of the old functionality is handled by Oracle Forms/Reports, while new functionality is slowly moving to the web and runs on top of an OA framework as a JEE application in Oracle WebLogic. The "old" part was client-server some time ago, but was retrofitted to the web as well and is available now as a Java applet, or a Java Web Start program. Forms/Reports provide desktop-like screens which are stable and work really well (provided there are no major issues on the DB and network side). They look a bit like they are from the 20th century though, which probably explains the desire to move out to an OA framework and pure web. From a Form screen, a user can also start a long-running request in the background which is going to be run by a Concurrent Manager process - a separate Java process running on an application node which is supposed to run long-running tasks in the background. Every concurrent request is tracked and recorded, and you can view quite a bit of data about it in FND_CONCURRENT_REQUESTS table. It's a great source of information about your program's runtime, and you can re-submit one again if needed.

Concurrent manager

There are a few inconveniences, though:
  • One nasty issue I've seen is with queries by internal concurrent managers accessing FND_CONCURRENT_WORKER_REQUESTS & FND_CRM_WORKER_REQUESTS views. Those views have text which is generated on a per client configuration. They can become really big and ugly. So ugly that Oracle's query transformation engine trying to use an OR expansion can lead to a very long hard parse and excessive PGA usage. There's a patch 20355502 which is not enabled by default, and there's a regression/improvement to the fix in bug 27321179. A workaround ia an SQL patch with _no_or_expansion=true.
  • There is no SID,SERIAL# in FND_CONCURRENT_REQUESTS. This is something really odd and hard to explain, and it makes it not always easy to map actions of a particular request into Active Session History data when the request is finished. What I'd really like to see is CM push requests' REQUEST_ID into a V$SESSION.ECID field which is then captured by ASH. That would greatly simplify CM request diagnostics. SID,SERIAL# are still required somewhere close to REQUEST_ID, as well.
  • Developers (both standard and custom functionality) try to organize programs into small pieces of work, and then combine them into a bigger task by submitting many of them as child requests, sometimes hundreds or thousands of requests. This is troublesome in many ways: slow, row-by-row processing, increased concurrency that is not needed most of the time, and large queues which take time to process. As a result, many programs that are supposed to be batch style and fast act as OLTP requests and take more time than necessary.
  • Some programs may require post-processing of their results by a separate process - Output Post Processors. Often post-processing is done with XML Publisher's call to convert results into rich text. If a template is complex and is not supposed to handle large outputs (and "large" here may be just a few thousands of rows), OPP processes may be stuck in either continuous Full GC, or simply fail with java.lang.OutOfMemoryError. A workaround for this issue may be an increase of the heap size for OPP process (luckily they are 64-bit, whereas Forms server is running 32-bit still), or avoiding XML Publisher altogether.

OA

EBS web front-end runs OA, which is a framework on top of ADF. Major OA issues are coming from the fact that the code has a lot of synchronized blocks. For example, each request from a logged-in user is getting a lock in OA.jsp, so that next request in the same HTTP session has to wait until the first one is finished. It's not a big deal if all requests are fast. But if anything unusual happens, and a single user request takes too much time and the subsequent user attempts to click something else in the same HTTP session, it will lead to no feedback from OA and the system will just be waiting for that single request to complete. Sometimes single, long-running requests can also lead to other user/system threads of a managed server becoming stuck waiting on a lock. If it happens, the whole managed server gets stuck and has to be restarted. Since there's locking involved in the code, there should be a chance for a deadlock as well. And there was - bug 26164249 fixes deadlock in one of the core OA classes related to asynchronous logging. Default connection pool setup is questionable; it's defined as having minimum one and maximum 500 connections. It leads to a continuous growth of the connection pool because OA uses one connection per HTTP session. Oracle Support thinks that this is normal, by the way. It is certainly not okay; web applications are supposed to use database connections for a very short time and then give them back into the pool for reuse by other requests. When connections are reserved for a very long time such as an HTTP session, web applications act as desktop applications with a direct database connection. It also pushes unnecessarily high requirements for the database server. If you need to support a few thousand logged-in users at any point, it won't be easy. Another issue with OA which has happened a few times in different places is java.lang.OutOfMemoryError due to some code unintentionally trying to fetch millions of rows, for example, bug 28219031. Such issue renders OA node unusable as well, and it has to be restarted.  

No Comments Yet

Let us know what you think

Subscribe by email