There are a whole set of fields in the databases I’m using here that are tilde-delimited (~) varchar strings with a mess of key-value pairs, the values from which I really need. Unfortunately, since they are varying character lengths, in no particular set order within that field, it is impossible to substring your way efficiently through them. Thankfully, there is a RegEx genius on my team who produced a handy chuck of code that pgSQL can easily recognize, parse and process for pulling precisely what I need.
substring(STUFF from 'KEYNAME=(.*?)[~]') as VALUE
Here’s the Step-by-Step:
- Use the SUBSTRING function to state that you are selecting only a portion of the string being passed in
- Specify the base text to be parsed as your first parameter
- Specify the key portion of the key-value pair from which the substring will start (technically, it starts at the end of the KEYNAME=)
- Retrieve any characters, repeated any number of times, until a ~ is found
Caveats:
- Regular Expression, from what I have read, does not find line break characters in this manner. If you need to parse something which is delimited by a new line, please read my post on how to split fields in PostgreSQL on line breaks.
Let me state, for the record, that I DO NOT know Regular Expression. If you ask me for other RegEx code or help with it, I will be of absolutely zero value. I am merely posting this here because I will otherwise forget how to do this… and it might be helpful as a baseline to others.