Nice!

I feel moderately pleased with myself:

That’s a work system that involves querying a table with about 16 million rows containing 12GB of free-form text (via an Oracle Text index) about five times a second, with response times over the web of less than 0.5 seconds. It probably counts as a bit of tiddler in the great scheme of things, but when I took it on, they had a quarter of the records with average response times of 26 seconds. I feel as if I’ve earned my money!

(Apropos nothing in particular, their uptime was originally less than 90 days… they were running Windows 2003. I’m just sayin’…)

It would have been longer, too, except the SysAdmin caused the rack to short one day about a year and a month ago (by plugging in a USB device somewhere. Go figure!)

Print Friendly

5 thoughts on “Nice!

  1. ales

    Hmm…0.5s vs. 26s improvement in response time is great achievement, but I don’t believe that has anything to do with OS type. It must be something at higher level, perhaps domain indexes were not used before you take on?
    I wonder how your (not so recent) PostgreSQL experiment turned on, do you think you could run your app. on PostgreSQL with close to 0.5 response time? In the last couple of months I was asked about possibility to replace Oracle with PostgreSQL, but I’m mostly skeptic about backup/recovery tools that are pathetic in comparison with Oracle RMAN…on the other hand, the price for PostgreSQL is right.

    Regards,
    Ales

    Reply
  2. dizwell Post author

    Ales, I didn’t say (or mean to imply that) the performance improvement had anything to do with the change of OS. I confined myself to remarking that *uptime* changed dramatically as a result of switching to Linux (form around 90 days max. to what you see here). I would never attribute a significant performance difference to my choice of O/S! You were supposed to look at the little graphic and note the uptime of 390 days… that’s what was “nice” about this system!

    Those performance improvements I mentioned in passing were precisely because I introduced the “correct” way of doing text searching (i.e., using Oracle Text rather than using “…like ‘%something%’…” tests), and the fact that the code I wrote was just generally a hell of a lot better than what went before (using bind variables was a good start!)

    PostgreSQL was tested with, if I recall, 8 million records, and performed well enough on simple searches to have been used in production. It began to fall down when you started searching for multiple search words at once. So, if you’re search is constrained so that only single- or dual-word search terms are allowed, it would suffice, I’d say. But if your search allows free-form text submission, forget it: Oracle Text handles that well, and PostgreSQL has nothing that gets close.

    I think you’re right about the tool-set, too: it’s a bit deficient. That’s not always a show-stopper, though: in this case, the search engine could have been re-built from scratch and re-extracted its data from a properly-protected Oracle system. Downtime would have been a bit of a bugger there, but the search engine itself didn’t particularly need to concern itself about data protection, provided its original data source did.

    But my other big no-no with PostgreSQL is that it doesn’t do very much by way of cache management: it relies almost entirely on the File System cache to do all its heavy lifting. Which is fine, I suppose, if you have control over what’s accessing the file system. I was going to try to achieve that by proposing completely dedicated search boxes. But when management said the budget would only stretch to having a couple of databases on one box, it meant that the file system would have been knocked about by both databases in a completely unmanageable fashion. Search times would have been *very* unpredictable as a result. With PostgreSQL, you are completely at the mercy of what the file system fancies doing, basically… which is not a place where I think DBAs feel particularly comfortable being!

    Reply
    1. dizwell Post author

      Well, I looked at it, but only briefly -and only as a standalone Apache project, not as something you incorporate into an Oracle database (which is a neat idea, by the way!) The documentation on it was a little difficult to get hold of, I thought. And, to be honest, it didn’t seem to offer us a lot that Oracle Text didn’t (we have no need of multi-column indexes, for example, and although indexing an IOT might have been a nice-to-have several years ago, we designed and worked around that so it was no longer an actual requirement either).

      Our budget would not have made the move feasible either: they wanted a way to ditch Oracle, not incorporate something into it! (Hence the look at standalone-Lucene, which might have been persuasive to the purse-holders, in combination with (say) MySQL or PostgreSQL).

      One other thing: the link you have to Lucene on your blog (at https://docs.google.com/View?docid=ddgw7sjp_569gf8c7cd8) says that “Oracle include a full feature Enterprise Engine named Oracle Text”. That makes it sound a bit like Enterprise Edition Oracle is required to use Oracle Text -but it isn’t (as you can see at http://www.oracle.com/us/products/database/product-editions-066501.html?ssSourceSiteId=otnen). It’s actually available in *all* database editions… even Express, which surprised me a little!

      Anyway, the idea of incorporating Lucene into Oracle sounds pretty clever, and thank you for the pointers to how to give it a whirl. Definitely something I’ll experiment with as soon as I can!

      Reply
  3. ales

    Thanks for verbose explanation above, it’s appreciated! Well, I’m sorry, it’s my fault not reading your post carefully, the response time improvement took all my attention…it’s because I take high uptime for Windows 2003 (and newer) as granted (I don’t see any difference in uptime on this front with our Linux boxes, I’m afraid). That said, I must admit, that since servers run in isolated env. I don’t celebrate “patch Tuesday” each month. I get excited when I approach 1000 days (I came as close as to 985 days), then I know some natural disaster is around the corner (on the 986 day 150 years flood record was broken, taking down pretty much every power transformer station around us)…. or perhaps some coworker will walk in the server room with his brand new (2.99$) voltmeter from Hong Kong, bringing down every fuse in the room :) .
    I’m sure you (personally) could keep up Windows box alive for months and years as you do with Linux! (Of course, this is in direct contradiction with your home state of affairs, I’m afraid, there I don’t expect OS to last more than a couple of months ;-) .
    Anyway, thanks for reminding me on another PostgreSQL shortcoming, the fact that developers cut a corner with file system cache. I don’t think this can scale very well above small (up to 20GB?) databases, depending on OS underneath.
    Regarding built-in Text in Express edition (spatial Locator is another such feature) I believe the true reason is not Oracle generosity, but carefully calculated strategy of bringing Apex to the masses. It would not make sense if they crippled Apex on this (search) front.

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>