Pythian Blog: Technical Track

Mind your rdbms/audit on 12c

Recently we've ran into an interesting question from one of our clients. They were seeing messages of the following form in syslog: "Aug 11 11:56:02 ***** kernel: EXT3-fs warning (device ******): ext3_dx_add_entry: Directory index full!" I haven't encountered this before, and did a bit of research. My initial suspicion ended up being correct, and it was due to too many files being created, somewhere in that file system. I had a look around, and eventually checked out the ORACLE_HOME of the ASM / Grid Infrastructure software, which is running version 12.1.0.2 on that host. I snooped around using du -sh to check which directories or sub-directories might be the culprit, and the disk usage utility came to a halt after the "racg" directory. Next in line would be "rdbms". The bulb lit up somewhat brighter now. Entering the rdbms/audit directory, I issued the common command you would if you wanted to look at a directories contents: "ls". Five minutes later, there was still no output on my screen. Okay, we found the troublemaker. So we're now being faced with a directory that has potentially millions of files in it. Certainly we all are aware that "rm" isn't really able to cope with a situation like this. It would probably run for a couple minutes until it's done parsing the directory index, and then yell "argument list too long" at us. Alternatively, we could use find, combined with -exec (bad idea), -delete, or even pipe into rm using xargs. Looking around a bit on the good ol' friend google, I came across this very interesting blog post by Sarath Pillai. I took his PERL one-liner, adjusted it a wee bit since I was curious how many files we actually got in there and ran it on a sandbox system with a directory with 88'000 files in it: perl -e 'my $i=0;for(<*>){$i++;((stat)[9]<(unlink))} print "Files deleted: $i\n"' It completed in 4.5 seconds. That's pretty good. In Sarath's tests he was able to delete half a million files in roughly a minute. Fair enough. After getting the OK from the client, we ran it on the big beast. It took 10 minutes. Files deleted: 9129797 9.1 million files. Now here comes the interesting bit. This system has been actively using 12.1.0.2 ASM since May 6th, 2015. That's only 3 months. That translates to 3 million files per month. Is this really a desirable feature? Do we need to start running Hadoop just to be able to mine the data in there? Looking at some of the files, it seems ASM is not only logging user interactions there, but also anything and everything done by any process that connects to ASM. As I was writing this, I happened to take another peek at the directory. [oracle@cc1v3 audit]$ ls -1 | wc -l 9134657 Remember those numbers from before? Three million a month? Double that. I suspect this was due to the index being full, and Linux has now re-populated the index with the next batch. Until it ran full again. A new syslog entry just created at the same time seems to confirm that theory: Aug 12 00:09:11 ***** kernel: EXT3-fs warning (device ******): ext3_dx_add_entry: Directory index full! After running the PERL one-liner again, we deleted another vast amount of files: Files deleted: 9135386 It seems that the root cause is the added time stamp to the file names of the audit files that Oracle writes in 12.1. The file names are much more unique, which gives Oracle the opportunity to generate so many more of them. Where in previous versions, with an adequately sized file system you'd probably be okay for a year or more; on 12.1.0.2, on an active database (and our big beast is very active) you have to schedule a job to remove them, and ensure it runs frequently (think 18+ million files in 3 months to put "frequently" into perspective).

No Comments Yet

Let us know what you think

Subscribe by email