MongoDB 3.0: mongos CentOS 6 service scripts

Premise

When installing mongos (not mongod) on CentOS 6 using the RPMs provided in MongoDB’s repository, no startup/service/init.d scripts are created, or really anything for that matter, to aid in managing mongos as a service. All the RPM provides is the binary.

If you want to treat mongos as a service, there are quite a few steps to follow beyond just setting up a script in init.d:

  • mongos YAML configuration file
  • mongos sysconfig service overrides
  • SELinux port definition
  • a user to run mongos

Most of the time there are many mongos processes distributed within a MongoDB cluster. Configuring each of these can be a hassle, let alone installing and managing them, unless you use an automation tool.

Be sure to read “Install MongoDB on Red Hat Enterprise or CentOS Linux” first.

mongos Service Scripts

Below are the 3 files I use when preparing a mongos cluster:

  1. The init.d service script for managing the mongos process
  2. A base mongos configuration file in YAML
  3. An automation script to prepare the CentOS 6 environment for mongos

/etc/init.d/mongos

The following init.d script has been retooled from the stock MongoDB 3.0 mongod for mongos. Watch out for the subtle differences.

/etc/mongos.conf Base YAML Configuration

Be sure to set sharding.configDB  appropriately.

Post-Install Environment Setup

Provided mongodb-org-mongos has already been installed, the following script will do most of the legwork to get the CentOS 6 environment up and running. Note: The script is pulling the configuration file and service file dependencies from a web server using wget; this should be tailored. It performs the following:

  • Add a mongod user (yes, with a d) with the same settings that would be supplied from mongodb-org-server.
  • Add a placeholder file for sysconfig.
  • Add the mongos.conf file with the base configuration noted in previous (wget).
  • Add the init.d service script (wget).
  • If SELinux is enabled, set the contexts for the files created in previous, and add the port definition via semanage. Note that it will automatically install the required tools to perform this task if required.
  • Add and enable the new mongos service

SELinux and MongoDB

The automation script above does not add all of the required SELinux policies required for mongos.  Install the mongodb-org package to initialize all of the policies; it can be removed later if need be while keeping these policies intact. Finding the script used by MongoDB (or making one up) is a task for another day. A policy dump from semanage is as follows:

 

Migrating Kickstart from CentOS 6 to CentOS 7 + ESXi VMware Tools

In another post I described how to install CentOS 6 via a kickstart file (be sure to check out the “Kickstart Sample Configuration” section). CentOS 7 was recently released, and with that I needed to also use a kickstart configuration. However, simply using the previous kickstart configuration was not as easy as copy-and-paste (besides updating the release version in the repository configuration).

Summary of Kickstart Changes

There were a few changes that needed to be made for a base/core installation of CentOS 7:

  • Include  eula --agreed (read the documentation)
  • Include  services --enabled=NetworkManager,sshd (read the documentation)
  • Update the install packages list ( %packages  section)
  • CentOS 7 is also a bit more strict with the kickstart file, so I had to explicitly include %end  where applicable
  • CentOS 7’s default file system is now xfs, in CentOS 6 it was ext4, so consider updating the automatic partitioning to use xfs
  • Package groups @scalable-file-systems, @server-platform, @server-policy, and @system-admin-tools no longer exist – I haven’t located suitable replacements yet
  • Things like ifconfig are no longer included by default (they are now deprecated), so if you need them be sure to include net-tools. You should be using ip by now anyway.

Kickstart Sample Configuration

And now, an updated kickstart config for CentOS 7, with consideration of the previously mentioned updates (compare it with the previous Kickstart Sample Configuration mentioned in the other post). I also chose to include some extra packages that don’t exist by default with a @core  installation.

 

VMware Tools Change

If you’re like me and use ESXi, I’m currently on version 5.5 and 5.5u1, the installable tools for integrating with ESXi is a nice treat. However, the repository location has changed specifically for RHEL7, and so have the packages.

Add VMware Tools to YUM

Put the following repo configuration in /etc/yum.repos.d/vmware-tools.repo:

Then update yum and install the tools:

 

 

uberSVN: Cheap Backup Script Using hotcopy and rdiff-backup

SVN backups, and backing up in general, is critical for the purpose of disaster recovery. There are a few requirements to consider:

  1. Backups should mirror (or come close to) the current state of system, i.e. a snapshot.
  2. Backups should be incremental. In the event a snapshot is malformed, or the system during the snapshot is corrupt, one can roll back to previously working snapshot.
  3. Take Backup only what you need to survive. No sense in including temporary files, libs that are part of the distribution (easily downloadable), etc.

Lower-level note: I don’t have the resources for LVM snapshots, and the physical disks are RAID 6.

Considerations

Originally, I was backing up my uberSVN installation and all of the SVN repositories using a simple rdiff-backup command. This approach is shortsighted: a rsync of the internal repositories directory does not consider current commits, prop changes, hooks, etc. that are occurring while the rsync is happening which could cause a non-restorable backup; using svnadmin hotcopy addresses this concern. However, the issue with hotcopy is that it does not perform an incremental backup for you, so I needed to couple it with rdiff-backup. It is worth noting that performing this type of copy operation will include, props, hooks, commits and other information – it is more comprehensive than a normal svnadmin dump, but with the downside that it is only truly compatible with the version of SVN that generated it.

As if this wasn’t enough to think about, hotcopy does not preserve file system timestamps. This is problematic with rdiff-backup which relies on a timestamp + file size combination; even though it uses the same underlying algorithm as rsync, AFAIK it does not support the checksum feature for transfers. So after the svnadmin hotcopy is performed, file attributes should be synchronized as well (with slight risk I might add).

Lastly, uberSVN has a handful of configurations and metadata that must be backed up per installation. Its GUI has an easy backup feature, but there is no CLI access/equivalent that I could find. I’m sure I could dig through the source and figure out exactly what was included in the backup, but I decided to reverse-engineer the backup file (which uses ZIP compression by the way) and infer other details. uberSVN includes (or should include) the following directories from its install directory (default is /opt/ubersvn): /opt/ubersvn{conf,openssl,tomcat,ubersvn-db}. From this, a backup archive can be carefully constructed for later restoration within the GUI.

The Implementation (tl;dr)

This is a bash script that is configurable in the first two grouping of variables (lines are highlighted). It also uses a subsys lock mechanism (flock) so it cannot be run in parallel which is helpful when using it in a crontab. I haven’t extensively tested its reliability, but it does work… generally speaking. Here’s the hack:

 

Scripting Parallel Bash Commands (Jobs)

Sometimes commands take a long time to process (in my case importing huge SQL dumps into a series of sandboxed MySQL databases), in which case it may be favorable to take advantage of multiple CPUs/cores. This can be handled in shell/bash with the background control operator & in combination with wait. Got a little insight from this StackOverflow answer.

The Technique

The break-down is commented inline.

Notes

  • This does not function like a pool (task queue) where once a “thread” is freed up it will be immediately eligible for the next task, although I don’t see why something like this could be implemented with a little work.
  • wait will pause script execution for all PID parameters it is provided to complete before moving on.
  • Once the if...fi control block is entered wait will cause the for...in loop to suspend.
  • $! (or ${!}) contains the PID of the most previously executed command; make sure it comes directly after the operation used with &. Throw it into a variable (like lastPid) for future use.
  • This is not multi-threading, although this simplified concept is similar. Each command spawns a separate process.
  • read  is just creating a multiline list; in my specific case this was favorable over a bash array, but either will work.
Screenshot of htop showing parallel process tree.

Reload DNS Zone with Bind9 and rndc

Was really confused when trying to reload a zone. I would receive the error rndc: 'reload' failed: dynamic zone. This was especially frustrating because in a similar installation, a simple service named reload would do just fine; not this time. Also, remember the point is to not completely reload named; doing so will clear its cache (if used as a local DNS resolver) and cause a small outage if clients do not have their current domain lookup cached.

Assuming the zone to update is called “THEZONE” perform the following:

If necessary, don’t forget to update the reverse DNS records. I’m lazy and use PTR:

So… the moral of the story is to  freeze, reload and thaw. Remember f-r-t (or fart?).

Install CentOS 6 with Anaconda/Kickstart (plus ESXi VMware Tools)

Synopsis

I’ve been getting my feet wet with ESXi, CentOS 6 VMs, and YUM/RPM. Well the last two I have been using for years, but not like recently.

The goal is to be able to blindly install a controlled distribution of CentOS 6.x quickly and without error (maybe even install multiple at the same time). What I needed:

  • Anaconda Kickstart file (ks.cfg)
  • Local mirrored repository for CentOS 6.x (6.3 in my example)
  • Custom 3rd party repo
  • HTTP/NFS/RSYNC access to these
  • Variable disk/cpu/ram size – the partitions need to be dynamic

Without writing a book about all of this, I really want to just highlight some problems I ran into and how I solved them.

An example of my kickstart file is below for reference.

Automated Partition Schema

Since I’m in the world of virtualized hardware, it is important for the disk to scale easily without lost data. The prerequisite to this is of course Logical Volume Management (LVM). Now you may not agree with my LVM layout, and honestly, this isn’t my expertise (optimization of disk partitions), but at the least there must be “boot,” “root,” and “swap” partitions.

The goal here is to make the root partition grow to its maximum size without negating the swap. Also, the boot partition won’t be on the LVM, it will be fixed in the MBR. The kickstart section is as follows:

I do want to note that the logvol’s are interpreted in a random order, so it is perfectly fine for the swap logvol to be declared after the root (/) logical volume.

Bypassing “Storage Device Warning”

The only problem I had in regards to a prompt-less install was the “Storage Device Warning” asking if I was sure I wanted to write to the disk and lose all of my data. No matter what I put in the partition specification of kickstart, it would always prompt. The answer is to use zerombr yes. See the option “zerombr” as defined within the CentOS kickstart guide. This can be placed anywhere in the kickstarter file (well except in %packages, %post or similar); just put it up near the top.

Auto Reboot

After the installation is complete, automatically reboot the machine. This works perfectly in ESXi since it automatically unmounts the virtual cdrom after the first boot of the guest! Simply put  reboot anywhere in your kickstart – near the top is probably best.

VMware Tools RPM

In order for the vSphere Client to monitor and execute certain tasks on the guest vm, VMware Tools is required. This will show you things like IP addresses, hostnames and guest state as well as integrated shutdown/reboot tools.

Add VMware Tools to YUM

Put the following repo configuration in /etc/yum.repos.d/vmware-tools.repo:

Then execute the following shell in %post of kickstart:

The important part to mention here is that the package is called vmware-tools-plugins-guestInfo. All the dependencies will come with it, so no worries there.

Mirroring a Repository for NFS Kickstart Installation

Create the Repo Mirror

Remember, my goal is to be able to quickly add a CentOS VM. With that, I don’t want to wait 30 minutes to pull down packages from a mirror in Iowa, New York or Cali. I want to pull it down once, keep it up-to-date and have my local install pull from my local mirror. For simplicities sake, I’ll put the mirror in /repo/centos.

I am choosing to exclude any local files/directories (“local*”) and also the huge DVD ISOs (“isos”). Also note that the mirror format is host::path and that the mirror host must support the rsync protocol.

Keep the Local Mirror Updated

To keep the local repo copy up-to-date, run this script via cron (by the way, I stole this from somewhere, I just don’t remember). Please don’t forget to swap out the mirror hostname and path with something that makes more geographical sense to you.

Configure NFS for Kickstart Network Installations

NFS server support is built into CentOS and running by default, so this is pretty easy. Add the following to /etc/exports:

This exports the directory “/repo/centos” for NFS. Only the subnet 172.16.0.0/16 is allowed access (no credentials required). It is mounted as read-only (ro), connection are synchronous as opposed to asynchronous (sync), and all connections are anonymous for security purposes (all_squash). Man exports(5) if you need more help.

Restart NFS via  service nfs restart.

I feel like I’m missing something with NFS, but I don’t recall; this was too easy. In my memory there was a struggle with rpc!

Update iptables for NFS

Edit /etc/sysconfig/iptables and throw these rules in there before -A INPUT -j REJECT --reject-with icmp-host-prohibited.

And restart iptables via  service iptables restart.

Configure Kickstart to Use Local Repo via NFS

This is an easy one-line if everything is set up correctly. Add the following after the “install” option within the kickstart configuration.

Use Local Repo Post Install

So you want to keep using your new local repo beyond the kickstart installation? No worries. Install apache, configure the vhost and update ks.cfg.

Inside vhosts.conf:

Add the following rule to iptables:

Restart iptables via service iptables restart;.

Start httpd via  service httpd start; chkconfig httpd on;.

Update the kickstart configuration:

Done!

Kickstart Sample Configuration

For the option “rootpw” use grub-crypt  with the specified hash algorithm under authconfig –passalgo=X (to replace DEFAULT_SALTED_ROOT_PASSWORD). In the sample ks.cfg file, I have sha512, so:

Using the Kickstart Configuration

The idea is to create a custom ISO with the kickstart configuration embedded, but I haven’t done this yet. So for now, I’m hosting the file as ks.cfg on an intranet HTTP server and booting a centos 6.3 netinstall (~200mb). At the bootloader prompt, specify extra parameters vmlinux initrd=initrd.img ks=http://some.host.local/ks.cfg. This installs all the packages, updates as needed, partitions the disk, runs a custom script, and reboots the machine.

Brain dump complete.

Installing Jenkins CI on CentOS 6.x (Tomcat with an AJP Proxy)

This is on a fresh minimal install of CentOS 6.3 (but should work for 6.x and also to my knowledge works for the latest versions of 5.x). Judging by how easy this is and how aligned CentOS is with normal RedHat (RHEL), it should work on any RHEL-based system 5.x or 6.x but don’t take my word for it.

Don’t copy and paste this thing, it is a guideline, some of it won’t work if you copy and paste.

First get CentOS updated all the way in a safe session (I like to use screen, make sure it is installed though):

Install the Jenkins RPM via YUM as described on their RedHad Repository page for Jenkins; at the time of this writing the commands are:

Don’t forget, you’ll need java as well (at least a JRE, but I can only seem to find the JDK, which will be overkill but sufficient):

Install httpd (Apache 2.2). It is bad practice to bind Tomcat (which Jenkins uses) to port 80. Tomcat is a service, not a web server. Apache will be used to proxy the requests to the Tomcat service and thus Jenkins through port 80 (the normal www port):

At this point Apache, nor Jenkins should be running. Update iptables and open TCP port 80 (no need to open port 8080 which Tomcat uses, all proxy comms will happen via the loopback):

And the iptables file is:

Close vi (:wq), and restart iptables:

Now, configure Apache. Update the ServerName and any other necessary configurations. I’ll leave that up to you (the reader). What you DO need to know is the virtual host proxy configuration. I’ll be using AJP (Apache JServ Protocol). Some say it is slower, others say it is faster than a normal proxy configuration. I have seen AJP function superbly on an enterprise-level system and I have never had any problems with it.

First make sure the module is loaded:

Should yield the result: “LoadModule proxy_ajp_module modules/mod_proxy_ajp.so”

This is enabled and available by default within CentOS. Getting/building a proxy_ajp module is a PITA, and if it is not available to you, that is outside the scope of this doc (although I have done this before and have been successful – maybe I’ll write a guide later).

Now configure the vhost (this file didn’t exist, so vi will create it for me):

And add:

Save, quit vi (:wq), and start httpd for the first time. It should start without a problem; Tomcat (Jenkins) does not need to be running for this to work – the proxy will simply timeout and fail until the downstream service is online.

Go ahead and start Jenkins for the first time:

Navigate to the domain you install it under (in this example I used jenkins.host.tld). Happy building!

After Jenkins is installed consider the following plugins:

  • Confluence Publisher Plugin
  • Publish Over FTP (although I prefer LFTP)
  • Role-based Authorization Strategy
  • LDAP Plugin (installed by default)

Lastly, Jenkins support for SVN 1.7 is still up in the air. According to #JENKINS-11381, it is complete, but I haven’t had a chance to install and play with SVNKit 1.7 support.

Setting up BIND 9 on CentOS 6 and Securing a Private Nameserver on the Internet

Today I was setting up a brand new server over at LiquidWeb (I have been hosting with this Lansing, MI based company for years, although I’m stubborn and have never tried out their heroic support). I already had the IP addresses (2) and the box provisioned. It is a clean install of the latest CentOS 6 – that means no cPanel/WHM, Plesk or similar. The box will serve many purposes, but it also needs its own nameserver. For the sake of this tutorial, the example domain will be putthingsdown.com and the two IP addresses my host provided are 11.22.33.44 and 11.22.33.45.

Register Your Private Nameservers at Your Registrar

My one-and-only registrar is GoDaddy. They keep things simple and allow for flexibility as far as domain management goes. They are just my registrar: I do not host with them, use their mail servers, nor their nameservers.

This part is simple, when you register the domain name, navigate to the domain management tool and update the namservers to “ns1.putthingsdown.com” and “ns2.putthingsdown.com” – these do not have to exist yet and will be created below.

Lastly, register the nameservers and a utility host at GoDaddy by adding the following three “Host” entries (not subdomain entries, but host entries – there is a difference):

  1. Hostname is ns1 and IP address is 11.22.33.44.
  2. Hostname is ns2 and IP address is 11.22.33.45.
  3. Hostname is host and IP address is 11.22.33.44.

Configuring named

  1. First, get the named service installed:  yum install bind-chroot
  2. Notice that bind-chroot will install under /var/named in its own chrooted directory. This is for security purposes. There should be hardlinks to the chrooted “data” and “slave” directories (this was updated with EL6 – yay!)
  3. Configure the rndc key: (Use the ampersand to send this process to the background, it will take 10-15 minutes to generate) rndc-confgen &
  4. Secure the newly generated rndc.key file:  chown named /etc/rndc.key; chmod 600 /etc/rndc.key;
  5. Get the rndc key name which is encapsulated in double qutoes from the generated file (it should be rndc-key by default):  grep key /etc/rndc.key
  6. Note on CentOS 6 and bind 9.7.3, there will not be a /etc/rndc.conf
  7. Create your first zone file in /var/named/data (example below):  vi /var/named/data/putthingsdown.com.zone
  8. Configure named (example below):  vi /etc/named.conf

Example Master/Authoritative Zone File

  •  The “serial” should be updated every single time this zone file is modified and reloaded into named. The format is as follows: YYYMMDDnn (where nn is an incrementor for the same day, e.g. 01, 02, 03…. 11, 12, 13)
  • refresh, retry, expire, and minimum are measured in seconds, just a note that these aren’t always followed, especially by residential DNS mirrors/servers.
  • The SOA (Start of Authority) is the… well… start of the zone record. This section (including the parens) is what kickstarts the zone file and defines the meta data.
  • After the SOA the first two records are NS (NameServer): the TTL (time-to-live) is 86400 seconds, or 1 day, and point to the (non-existant, yet) nameservers ns1 and ns2.
  • The next 2 records are A (Address) records that register the ns1 and ns2 subdomains and bind them to IP addresses – now the two NS records have something to point to.
  • The third A record is the actual domain itself which is bound to the primary IP address. This is proof that you really don’t need the “www” in front of the domain name, although this is also dependent on the web server configuration
  • “host” will serve as a utility subdomain which also points to the primary IP. This is helpful in the future if there is a secondary util server used for SSH to manage all the servers within the private network.
  • Lastly, the www subdomain acts as a CNAME (Canonical Name), or alias to putthingsdown.com – essentially putthingsdown.com and www.putthingsdown.com will take you to the same place. Hint: try not to use CNAME records if you don’t have to, although they make your zone more flexible for future enhancements, since they require a secondary lookup.

Example named Configuration

The lines highlighted below are the ones that changed from the default named.conf provided by the installation script. I am not going to go over this in detail, but do what to highlight a few pieces. Note: this is the insecure version of the named configuration; continue reading for security enhancements.

  • The first include is to the rndc.key file that was generated in step 3 above.
  • The “trusted” ACL has your two IP addresses in it.
  • Within the controls declaration, the key name found in step 5 above should be defined.
  • “listen-on” needs to have both IP address listed – named binds itself to port 53 on both these IP addresses
  • Setting “allow-query” to any will allow any upstream DNS server the ability to query yours.
  • The section at the bottom for the zone is the inclusion of the zone file created in previous.

Almost Done: Test Stuff

Before we take the step to secure the named server, let’s make sure it works first. Restart the service (hopefully it doesn’t throw any errors). Once it restarts successfully let’s start it on boot-up (with its default run levels); make sure the chkconfig took:

I’ll assume port 53 is open for TCP and UDP. Honestly, this isn’t me “not knowing” which protocol; DNS primarily uses UDP, but has been known to use TCP as a fail-over and will definitely be used in IPv6.

Lastly, use a public tool on the web to verify your DNS configuration. I like to use NsLookup by Network-Tools.com. Simply put “putthingsdown.com” in the domain field and hit GO. If everything is set up correctly, some records from the zone file will be listed, specifically the SOA and NS, as well as the primary A.

Securing named

After some super-fast searching, I found this nifty BSP (not sure what BSP stands for?) over at NIST.gov: How to Secure a Domain Name Server (DNS). Here’s the gist of §3.1.2 (most of these are snippets, they shouldn’t be interpreted verbatim):

  • override “version” number using  options { version "dunno kthxbye"; };
  • restrict zone transfers:  options { allow-transfer { localhost; trusted; }; };
  • restrict dynamic updates in each zone:  zone "putthingsdown.com" { allow-update { localhost; trusted; }; };
  • protect against DNS spoofing:  options { recursion no; };
  • restrict by default all queries:   options { allow-query { localhost; trusted; }; };
  • allow individual zone queries:  zone "putthingsdown.com" { allow-query { any; }; };
  • Verify with security tools: DNSWalk (online version) zone transfer should fail with “REFUSED”, ZoneCheck, and dlint. Happy hunting.

CentOS 5.8 + Apache 2.2 + PHP 5.3 + suPHP 0.7.1

So I’m a bit of a purist when it comes to CentOS administration. CentOS is built on the idea of stability and sustainability. Without the addition of extra 3rd-party repositories, it provides the bare necessities to run a reliable and secure server. Don’t get me wrong though, there are plenty of great packages out of the box (from OpenSSL, Apache, PHP to OpenLDAP, PostgreSQL and then some), but sometimes you need some heavy-duty next-gen power tools like ffmpeg, nginx, OpenVPN or suPHP. Most of these packages are not available from the “CentOS Certified” base, extras and updates repositories; in fact, you can’t get them via yum without adding a third-party repo like RPMForge.

With that said, I need suPHP for a PHP staging environment. I’m not going to talk about what suPHP is, you can read about it on your own time. Going back to me being a purist, I don’t use RPMForge repos or anything similar. I like to stick to base and extras only and since there isn’t a suPHP RPM available – I’ll have to build it myself. The proper way to do this is to build it as an RPM (Red Hat Package Manager) and install via yum from the locally built RPM, but for whatever reason I can never get myself to do it this way.

Reminder, suPHP can only use PHP CGI, not PHP CLI (so look for a php-cgi binary, not just a php one)

Download & Building suPHP from Source

Before we start, make sure you have dev tools:

We’ll also need development packages for httpd (Apache 2.2), php53 (PHP 5.3), and apr (Apache Runtime Libraries and Utilities):

Now create a working directory, download the suPHP src, configure it and build (make). Note that you need to figure out where the apr config is located, mine is at /usr/bin/apr-1-config

Configure Apache + PHP to use suPHP

I’ll admit, I relied heavily on the suPHP docs, but even then it was not 100% complete. That, and sites like this one didn’t provide any useful information – I’m mainly aggravated that they used RPMForge and did not use php53 packages. But, after some re-reading, reinterpreting and trial & error, I’m up and running… and this is how it went (starting to get tired of writing this post, this will be short and sweet):

Important Files

  • /usr/local/etc/suphp.conf (this is the core suPHP configuration)
  • /etc/httpd/conf.d/suphp.conf (this is the Apache mod_suphp configuration… needed to create this)
  • /etc/httpd/conf.d/php.conf (this is the php configuration that I had to disable)
  • /etc/httpd/conf/httpd.conf (for some of the primary virtual hosts… all my other vhosts are in separate files)

suPHP Core Configuration

/usr/local/etc/suphp.conf, I based it off of the suphp.conf-example file located in the source code’s doc directory. This is an ini-style configuration:

mod_suphp Configuration

/etc/httpd/conf.d/suphp.conf:

PHP Configuration

/etc/httpd/conf.d/php.conf, just comment everything out, you don’t need it

Apache Virtual Host (vhost) Configuration

This can be set in each individual vhost if you want to override. For example:

Almost Done…

Now restart httpd:

Refresh a php page and check. If it didn’t work, re-read this post or email me (contact info in my resume) and I won’t help, but i’ll refine this post and provide more information.

MySQL UDF: Perl Regular Expression Clauses

Currently working on migrating a database in MySQL. I needed to to some perl-like regex find and replaces on cells. MySQL does not support this natively. It does support REGEX/RLIKE, which is basically a LIKE clause with regular expression support – this is crap: is only useful for lookup queries and not data manipulation. One may argue that relational databases should only be used to load and serve static data and any manipulation of data should be done externally of the database. Well I say, “Bollocks!” in this case. When I’m on a utility server doing one-time, one-way updates to row data I don’t care if there’s a performance hit – of course I’m not stupid enough to implement and utilize these types of queries in a production environment (in this case the data coming in should be prepared and optimized before hand in order to maximize query times).

So after some really brief searching I found this little library called “lib_mysqludf_preg.” I’ll just document my installation procedure. Remember, I’m on CentOS 5.8 (recently upgraded from 5.6, went pretty smooth), oh and per usual I’m doing this as root since I haven’t broken that habit yet.

Download and Build the Module

First, a pcre module is required, so I went ahead and grabbed that:

While you’re at it, make sure you have things like make:

Create a compilation directory and grab the lib_mysqludf_preg source (double check the site for the latest stable build, at the time of this writing it is 1.0.1):

Now you should be in the directory full of source code. Go ahead and perform the preliminary configuration and checks for the upcoming build:

Everything went smoothly for me. It found mysqlbin but threw the notice “ERROR 1045 (28000): Access denied for user ‘root’@’localhost’ (using password: NO)” don’t be alarmed, it is just making sure mysqlbin is available. It also found mysql_config at /usr/bin/mysql_config and PCRE at /usr/bin/pcre-config (with pcre v6.6).

OK, so let’s “make” it:

OOPS! Looks like the initial make crapped out. I’m missing mysql dev files… soooo:

Now reconfigure and try again:

Finally! Looks like everything went smooth. OK let’s install it:

Success! Now to see if I can load the so module within MySQL:

Uh-Oh, can’t find the .so module…. read on.

Register and Install the Module in MySQL

We’re not done yet. So far all we’ve done is built the .so plugin, but we need MySQL to find it. By default, my distro put it in /usr/local/lib, but MySQL doesn’t know that exists. Why? Well my plugin_dir configuration for MySQL is blank, which means it falls back to the system’s dynamic link resolver. So I go look that one up:

…which gives me “/usr/lib64/mysql” This is where I need to copy those modules. I’ll be honest here, I don’t know if I need just the .so or all 3 files that the build created, so i’ll copy all 3 just to be safe and give them execute permissions:

Now let’s register them with ldconfig and restart mysql (again, tbh, not sure of the mysql restart is required, actually, I don’t think it is but better safe than sorry).

Now install the user-defined functions:

Finally, test it all to make sure the installed UDFs are working:

All test came back ok. Should be done now. That is all. kthxbye.