Programming • [Bash] PCRE: Latin letters with diacritics false match and not-match

I have tried to use Perl Compatible Regular Expressions (PCRE) by grep -P, also by pcre2grep and pcregrep.
All have the same error. They read Latin letters with diacritics as word boundary.
Some of them wrongly do not match them as lower letters.

It should not be error of encoding, because I have set UTF-8 in all locale variables:

Code:

$ printenv|grep -P '^L[AC]'|sortLANG=sk_SK.UTF-8LC_ADDRESS=sk_SK.UTF-8LC_IDENTIFICATION=sk_SK.UTF-8LC_MEASUREMENT=sk_SK.UTF-8LC_MONETARY=sk_SK.UTF-8LC_NAME=sk_SK.UTF-8LC_NUMERIC=sk_SK.UTF-8LC_PAPER=sk_SK.UTF-8LC_TELEPHONE=sk_SK.UTF-8LC_TIME=sk_SK.UTF-8

Testing file:

Code:

$ cat diakritika.txt -čí-čia-čo-Evička-Košice-ký-mám-úži-Žiar-42úver

IMHO wrong results:

Code:

$ grep -P '\b\p{Ll}{2}' diakritika.txt -čia-Evička-Košice-ký-mám-Žiar-42úver

Code:

$ pcregrep '\b\p{Ll}{2}' diakritika.txt -čia-Evička-Košice-Žiar-42úver

Code:

$ pcre2grep '\b\p{Ll}{2}' diakritika.txt -čia-Evička-Košice-Žiar-42úver

Versions of commands and libraries:

Code:

$ LANG=C; grep --versiongrep (GNU grep) 3.11⋮grep -P uses PCRE2 10.44 2024-06-07

Code:

$ pcregrep --versionpcregrep version 8.39 2016-06-14

Code:

$ pcre2grep --versionpcre2grep version 10.44 2024-06-07

Code:

$ bash --versionGNU bash, version 5.2.37(1)-release (x86_64-pc-linux-gnu)

(Of course, I have switched LANG to C after the testing, due language used in printing of versions.)
grep -P and pcre2grep seem to use the same library, but they give different results;
-ký & -mám are correct answers, but pcregrep & pcre2grep do not match them;
-úži should be in results, but no commands match it.
-Evička, -Košice, -Žiar, -42úver are all false results, because they begin by upper-case letter or digit (digits are considered to be word characters by Regular-Expressions.info: Word Boundaries).
Of course, I use single letters as single UniCode characters, not composites with Combining Diacritical Marks.
Do I make some error? Or is it bug in libraries (it seems to me to be improbable)?

Statistics: Posted by ruwolf — 2025-01-02 18:30 — Replies 0 — Views 21

Programming • [Bash] PCRE: Latin letters with diacritics false match and not-match

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112