| www.ClassicTW.com https://mail.black-squirrel.com/ |
|
| ANSI Control Characters https://mail.black-squirrel.com/viewtopic.php?f=15&t=33923 |
Page 1 of 2 |
| Author: | SupG [ Thu Apr 04, 2013 12:27 pm ] |
| Post subject: | ANSI Control Characters |
Anyone know of a good regex to identify all ANSI control Characters? I'm currently using Code: \x1b[^m]*m |
|
| Author: | Mongoose [ Thu Apr 04, 2013 12:42 pm ] |
| Post subject: | Re: ANSI Control Characters |
The REs I used for the JTX demo app are here: https://sourceforge.net/p/jtx/code/15/t ... exer.jplex. Note the "unknownEscape" catch-all. |
|
| Author: | Micro [ Thu Apr 04, 2013 12:51 pm ] |
| Post subject: | Re: ANSI Control Characters |
This is what I am using: Code: //regular expression to fit ANSI control sequences string ansiControlRegEx = (char)27 + @"\[" + "[^@-~]*" + "[@-~]"; |
|
| Author: | SupG [ Thu Apr 04, 2013 2:33 pm ] |
| Post subject: | Re: ANSI Control Characters |
Mongoose wrote: The REs I used for the JTX demo app are here: https://sourceforge.net/p/jtx/code/15/t ... exer.jplex. Note the "unknownEscape" catch-all. Thanks, this helped - I ended up using one of the strings in here, some of them are unnecessary I found, however. I've come up with 3 reg expressions that seem to match most if not all of what I'll need to filter out. Micro wrote: This is what I am using: Code: //regular expression to fit ANSI control sequences string ansiControlRegEx = (char)27 + @"\[" + "[^@-~]*" + "[@-~]"; I initially thought something like this would work after reading the definition of the control sequences, but this regex misses quite a bit actually. Here are the 3 codes I found work quite well for my purposes. Code: "\x1b[^m]*m" "\x1b\[([0-9]+(;[0-9]+)*)?[Hf]" "\x1b\[[0-9]+[A-HJKST]" Thanks for the help guys. |
|
| Author: | Micro [ Thu Apr 04, 2013 4:33 pm ] |
| Post subject: | Re: ANSI Control Characters |
I copied mine from some sample source code, and it worked, so I didn't give much more thought to it. You are missing AutoWrap Mode (^[?7h), save/restore cursor position (^[s / ^[u), and hide cursor (^[l). TheDraw likes to put ^[?7h at the beginning of every file it creates. Would the following catch everything in a single expression? Code: "\x1b\[([0-9]+(;[0-9]+)*)(?)[A-HJKSTsuflhm]" or simpler Code: "\x1b\[([0-9]+(;[0-9]+)*)(?)[A-z]" or simplest Code: "\x1b\[*[A-z]" |
|
| Author: | Mongoose [ Thu Apr 04, 2013 5:00 pm ] |
| Post subject: | Re: ANSI Control Characters |
Code: "\x1b\[*[A-z]" You can't compress [A-Za-z] to [A-z]. There are some non-alphabetic characters in between. Also, the "\[*" in your expression says match any number of '['. I assume you meant match any number of any character, which would be ".*". But that's still probably not what you want, because of the principle of maximal munch. The regex "\x1b\[.*[A-z]" would match the entirety of "\x1b\[0;31mHello, World!\x1b\[0m", not just the "\x1b\[0;31m". The catch-all in the lexer spec I linked vaguely reads, "match an escape, followed by a '[', followed by any number of non-alphabetic characters, followed by any alphabetic character." This matches an infinite number of strings that aren't valid ANSI codes, but the important thing is that it matches all strings that are valid ANSI codes. (Or at least, all the common ones used in BBS games.) |
|
| Author: | Micro [ Thu Apr 04, 2013 5:04 pm ] |
| Post subject: | Re: ANSI Control Characters |
It's still confusing to me :/ how about Code: "\x1b\[.*/^[A-Za-z]+$/" |
|
| Author: | Mongoose [ Thu Apr 04, 2013 5:08 pm ] |
| Post subject: | Re: ANSI Control Characters |
Hmm... some of the notation doesn't survive being quoted. Micro, what do the '/' and '@' mean in your regexes? I'm not familiar with that notation. And outside of a character range, I know '^' as the beginning-of-line operator. |
|
| Author: | Micro [ Thu Apr 04, 2013 5:14 pm ] |
| Post subject: | Re: ANSI Control Characters |
I don't know exactly, I'm confsued :/ My original example was from a terminal emulation tutorial somewhere on the web and the / came from here: http://stackoverflow.com/questions/6067 ... characters "[^@-~]*" + "[@-~]" in my original example @-~ is a range that would include much more than the alpha characters needed. I don't understand what the ^ does. |
|
| Author: | GsuP [ Thu Apr 04, 2013 5:27 pm ] |
| Post subject: | Re: ANSI Control Characters |
Thanks Micro, good stuff. I think the last two might cover too much though - could run the risk of matching characters you don't want. I guess I was just too lazy to combine my 3 expressions into one, but I like your first one - I'll use that with a slight modification. Here is what I'm gonna go with. Code: \x1b\[([0-9]+(;[0-9]+)*)?\??[A-HJKSTsuflhm] |
|
| Author: | Mongoose [ Thu Apr 04, 2013 5:29 pm ] |
| Post subject: | Re: ANSI Control Characters |
If '^' is the first character in a range or set, it means "not these characters". If it's any other position in a range or set, it's a literal '^'. And outside of a range or set, if it's not escaped as a literal it means "beginning of a line". Not all regex languages support beginning and end of line operators. Part of the confusion is that there's not a universal regex language. I don't recognize the language they're using in that SO post. This site gives a pretty good overview; it was helpful to me when I was first learning about REs. |
|
| Author: | Micro [ Thu Apr 04, 2013 5:35 pm ] |
| Post subject: | Re: ANSI Control Characters |
Mongoose, this is from you code, will it match any ANSI string? Code: "\[[^a-zA-Z]*[a-zA-Z]" What if you limit the middle a little more? Code: "\[[0-9;?]*[a-zA-Z]" What you said about the ^ makes sense. What I read elsewhere did not make sense: according to: http://www.regular-expressions.info/reference.html ^ (caret) Matches at the start of the string the regex pattern is applied to. Matches a position rather than a character. Most regex flavors have an option to make the caret match after line breaks (i.e. at the start of a line in a file) as well. |
|
| Author: | Micro [ Thu Apr 04, 2013 5:42 pm ] |
| Post subject: | Re: ANSI Control Characters |
GsuP wrote: Thanks Micro, good stuff. I think the last two might cover too much though - could run the risk of matching characters you don't want. I guess I was just too lazy to combine my 3 expressions into one, but I like your first one - I'll use that with a slight modification. Here is what I'm gonna go with. Code: \x1b\[([0-9]+(;[0-9]+)*)?\??[A-HJKSTsuflhm] I don't know much about regex, but I just learned a lot from this thread. Mongoose seems to have a much better understanding. what does ?\?? do? |
|
| Author: | Mongoose [ Thu Apr 04, 2013 5:50 pm ] |
| Post subject: | Re: ANSI Control Characters |
Micro wrote: Mongoose, this is from you code, will it match any ANSI string? Code: "\[[^a-zA-Z]*[a-zA-Z]" What if you limit the middle a little more? Code: "\[[0-9;?]*[a-zA-Z]" I think that would work. Some languages might make you escape the '?' in the range... not sure. Micro wrote: what does ?\?? do? '?' means zero or one of something. So "?\??" would be zero or one of whatever was right before it, followed by zero or one literal question mark. |
|
| Author: | SupG [ Thu Apr 04, 2013 5:59 pm ] |
| Post subject: | Re: ANSI Control Characters |
I usually try to avoid using regular expressions because of the physical pain they cause me. Thus I don't know a lot about them. I've learned stuff today AND my wife made me a sammich. Must be a good day. |
|
| Page 1 of 2 | All times are UTC - 5 hours |
| Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group http://www.phpbb.com/ |
|