Tried solving today's using only regex, and it turns out that it's not easy to match a string containing any character consecutively repeated exactly 2 times (but no more).

For example, I want to match "abccd", but not "abcccd".

It's simple for a single character, e.g. the "c" from the example above:

(?<!c)cc(?!c)

This uses negative lookbehind and lookahead to make sure the two c's are not preceeded or followed by any other c's.

Show thread

But if you want to make it generic, e.g. match any letter, you might try using backreferences:

(?<!\1)(\w)\1(?!\1)

This produces the error: "lookbehind assertion is not fixed length" since the regex engine cannot know how big the backref is.

Show thread

Another attempt was using the \K delimiter which allows for variable-length lookabehinds by matching anything before it and discarding it. However there is no negative variant of \K so it's of no use here.

Show thread

Finally, I tried matching the character in the lookbehind, and while this kind of thing works for a positive lookbehind, e.g. this matches any letter preceeded by itself:

(?<=(.))\1

It does not work for negative lookbehind. This does not match anything:

(?<!(.))\1\1

Show thread

@ihabunek this would have been my first thought

(?<=(.))[\1]{2}[^\1]

Follow

@hirojin I did think of something similar, but [\1] matches character with ASCII code 1 (base 80), and not the backref. E.g. [\80] would match 0.

As far as I can tell, it's not possible to use back-references in square brackets.

Sign in to participate in the conversation
Mastodon

Server run by the main developers of the project 🐘 It is not focused on any particular niche interest - everyone is welcome as long as you follow our code of conduct!