The Web Design Group

... Making the Web accessible to all.

Welcome Guest ( Log In | Register )

2 Pages V < 1 2  
Reply to this topicStart new topic
> Pattern not working, Pattern is not limiting the number of elements
pandy
post Aug 3 2020, 08:48 PM
Post #21


Computer says no.
********

Group: WDG Moderators
Posts: 19,110
Joined: 9-August 06
Member No.: 6



This actually made me find my regexp book and flickering thought it I landed on Lesson 4 where I among other things can learn how to match digits and non digits.

\d matches a digit
\D matches a non digit

Alas \D also includes whitespace and other non digit characters.

But...
\s matches any whitespace character
\S matches any non whitespace character

I'm sure if I read on I'll learn how to combine these into something quite sensible. Maybe this stuff isn't so bad after all. wub.gif
User is offlinePM
Go to the top of the page
Toggle Multi-post QuotingQuote Post
Christian J
post Aug 4 2020, 07:32 AM
Post #22


.
********

Group: WDG Moderators
Posts: 8,548
Joined: 10-August 06
Member No.: 7



QUOTE(pandy @ Aug 4 2020, 03:39 AM) *

And here I found one we can use if we want to be sure a proper email address is entered. biggrin.gif

https://blog.codinghorror.com/regex-use-vs-regex-abuse/ (at the bottom)

Assuming that the regex itself is correct. I'm not going to entangle it. blink.gif

Much easier to let the browser developers do the work:

CODE
<input type="email">

but apparently HTML5 relies on a simpler definition of a valid email address:

QUOTE
The following JavaScript- and Perl-compatible regular expression is an implementation of the above definition.
/^[a-zA-Z0-9.!#$%&'*+\/=?^_`{|}~-]+@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*$/

And once the form is submitted, do we need to use email validation on the server-side as well (in addition to sanitation, which must always be used)? If so I guess it should follow the exact same validation rules, to avoid bugs, but I have no idea if that's the case with e.g. PHP's FILTER_VALIDATE_EMAIL. FWIW, https://www.php.net/manual/en/filter.filter...date.php#102398 says that it's too strict for Internationalized domain names...


User is online!PM
Go to the top of the page
Toggle Multi-post QuotingQuote Post
Darin McGrew
post Aug 4 2020, 08:24 AM
Post #23


WDG Member
********

Group: Root Admin
Posts: 8,338
Joined: 4-August 06
From: Mountain View, CA
Member No.: 3



QUOTE
This actually made me find my regexp book

I think that's the key for regular expressions, if you're not using them all the time. Get familiar enough to know what kinds of things they can do, and then look them up when you need the details. Especially look them up because the details of different implementations of regular expressions vary.
User is offlinePM
Go to the top of the page
Toggle Multi-post QuotingQuote Post
pandy
post Aug 4 2020, 10:39 AM
Post #24


Computer says no.
********

Group: WDG Moderators
Posts: 19,110
Joined: 9-August 06
Member No.: 6



Yeah, different flavours are a stumble stone for me. If it was a matter of learn it once I might consider. Besides, there's nothing easier than to get people to write an regexp for you. It's Christmas for those guys when someone asks for help and you have at least 5 variants in no time. biggrin.gif
(Sheesh. Now I sound like a "Gimme the code guy". blush.gif)

But I'm a little inspired now. I think I'll read the book. A chapter a day I could maybe handle. It could be handy to be able to do simple stuff and understand a little more of expressions I see and maybe use and also to be able change them a little. But I'll never be good as this as I have little practical use for it.
User is offlinePM
Go to the top of the page
Toggle Multi-post QuotingQuote Post
Christian J
post Aug 4 2020, 11:03 AM
Post #25


.
********

Group: WDG Moderators
Posts: 8,548
Joined: 10-August 06
Member No.: 7



You could always add to the library of PATTERN attribute regexes for practice...
User is online!PM
Go to the top of the page
Toggle Multi-post QuotingQuote Post
pandy
post Aug 4 2020, 01:47 PM
Post #26


Computer says no.
********

Group: WDG Moderators
Posts: 19,110
Joined: 9-August 06
Member No.: 6



I don't think my attempts should be trusted. biggrin.gif
User is offlinePM
Go to the top of the page
Toggle Multi-post QuotingQuote Post
Christian J
post Aug 5 2020, 05:45 PM
Post #27


.
********

Group: WDG Moderators
Posts: 8,548
Joined: 10-August 06
Member No.: 7



Nothing to worry about, considering the constant patching of all kinds of software one obviously can't trust anyone else either!
User is online!PM
Go to the top of the page
Toggle Multi-post QuotingQuote Post
pandy
post Aug 5 2020, 06:31 PM
Post #28


Computer says no.
********

Group: WDG Moderators
Posts: 19,110
Joined: 9-August 06
Member No.: 6



You should write one. rolleyes.gif
User is offlinePM
Go to the top of the page
Toggle Multi-post QuotingQuote Post
pandy
post Aug 5 2020, 07:42 PM
Post #29


Computer says no.
********

Group: WDG Moderators
Posts: 19,110
Joined: 9-August 06
Member No.: 6



Or we just leave it to Darin. laugh.gif
User is offlinePM
Go to the top of the page
Toggle Multi-post QuotingQuote Post
pandy
post Aug 8 2020, 01:03 AM
Post #30


Computer says no.
********

Group: WDG Moderators
Posts: 19,110
Joined: 9-August 06
Member No.: 6



I'm too fast. I'm on lesson 4 already. I won't remember.
User is offlinePM
Go to the top of the page
Toggle Multi-post QuotingQuote Post
Christian J
post Aug 8 2020, 10:12 AM
Post #31


.
********

Group: WDG Moderators
Posts: 8,548
Joined: 10-August 06
Member No.: 7



Out of how many lessons? :-p

Here's a challenge for you: make a tool in e.g. javascript that explaines in clear text what a user-submitted regex actually does (similar to that site, "CSS-explain" was it?).

A tool that creates a regex from various predefined cleartext alternatives might be very useful too, but I suspect the UI will be daunting.
User is online!PM
Go to the top of the page
Toggle Multi-post QuotingQuote Post
pandy
post Aug 8 2020, 10:04 PM
Post #32


Computer says no.
********

Group: WDG Moderators
Posts: 19,110
Joined: 9-August 06
Member No.: 6



10. But I expect the remaining chapters to take longer...

In lesson 4 I learnt this
CODE
\w matches any alphanumeric character in upper or lower case and underscore
\W what isn't matched by the above


Sort of handy, except I don't understand why underscore is lumped together with alphanumeric characters, but I'm sure there is a reason, probably to do with some programming language or *nix . Then I tried it in Notetab. Turnes out there it's the other way around! \W matches alphanumeric and underscore. laugh.gif

Now I use an older version of NTB because I had an accident with the upgraded version and haven't bothered to fix it. It changed regex enginge at some point and now uses PCRE. Don't know what it used before, something obscure probably since it isn't mentioned in Help. I'm sure this will work as expected once I get around to install the new version (that I have now got around to obtain from the company).

Strange is that in NTB Help the description of \w and \W is something that I don't even understand but at first glance looks like something different.

CODE
\w   any word delimiter. Matches any of \t\s!"&()*+,-./:;<=>?@[\]^`{|}~
\W   any nonword delimiter. Equivalent to [^\t\s!"&()*+,-./:;<=>?@[\]^`{|}~]


Maybe that means alhpanumeric and underscore and the opposite, I don't know. I get the meaning of "word delimiter", but I don't see how underscore fits in with "nonword delimiter", which should mean alphanumeric characters, and it is matches together with alphanumeric characters in NTB too. wacko.gif
User is offlinePM
Go to the top of the page
Toggle Multi-post QuotingQuote Post
Christian J
post Aug 9 2020, 07:14 AM
Post #33


.
********

Group: WDG Moderators
Posts: 8,548
Joined: 10-August 06
Member No.: 7



QUOTE(pandy @ Aug 9 2020, 05:04 AM) *

In lesson 4 I learnt this
CODE
\w matches any alphanumeric character in upper or lower case and underscore
\W what isn't matched by the above


According to http://www.javascriptkit.com/javatutors/redev2.shtml :

CODE
\w matches any alphanumerical character (word characters) including underscore (short for [a-zA-Z0-9_]).
\W matches any non-word characters (short for  [^a-zA-Z0-9_]).

Just A-Z as usual, in other word.

QUOTE
Strange is that in NTB Help the description of \w and \W is something that I don't even understand but at first glance looks like something different.
CODE
\w   any word delimiter. Matches any of \t\s!"&()*+,-./:;<=>?@[\]^`{|}~  

That looks like any character you can use between words (such as tabs, spaces, commas etc).

QUOTE
CODE
\W   any nonword delimiter. Equivalent to [^\t\s!"&()*+,-./:;<=>?@[\]^`{|}~]

That looks like the negation of the above. But the description "any nonword delimiter" seems misleading, shouldn't it be spelled "any non word delimiter"? ("Nonword" could mean anything that's not a word. )
User is online!PM
Go to the top of the page
Toggle Multi-post QuotingQuote Post
pandy
post Aug 9 2020, 07:36 AM
Post #34


Computer says no.
********

Group: WDG Moderators
Posts: 19,110
Joined: 9-August 06
Member No.: 6



QUOTE(Christian J @ Aug 9 2020, 02:14 PM) *

That looks like any character you can use between words (such as tabs, spaces, commas etc).

Yes, but in reality it matches any non alphanumeric character except new line, just as the description in the book, only it has \w and \W the other way around.

QUOTE
CODE
\W   any nonword delimiter. Equivalent to [^\t\s!"&()*+,-./:;<=>?@[\]^`{|}~]

That looks like the negation of the above. But the description "any nonword delimiter" seems misleading, shouldn't it be spelled "any non word delimiter"? ("Nonword" could mean anything that's not a word. )


Yes, that I understood, it's just a negation of the first. But I still don't see how the underscore fits in. The author is Swiss, that may explain possible English mistakes. tongue.gif

I finally got around to installing an updated version. Now it works as in the book. But I think there are lots of differences between flavours, even if hopefully not as stupid as this one. That old engine is probably more than 20 years old. One would hope they strive to make them more compatible.
User is offlinePM
Go to the top of the page
Toggle Multi-post QuotingQuote Post
Christian J
post Aug 9 2020, 12:08 PM
Post #35


.
********

Group: WDG Moderators
Posts: 8,548
Joined: 10-August 06
Member No.: 7



QUOTE(pandy @ Aug 9 2020, 02:36 PM) *

But I still don't see how the underscore fits in.

Actually none of them contain underscores:

CODE
\w   any word delimiter. Matches any of \t\s!"&()*+,-./:;<=>?@[\]^`{|}~
\W   any nonword delimiter. Equivalent to [^\t\s!"&()*+,-./:;<=>?@[\]^`{|}~]

unsure.gif
User is online!PM
Go to the top of the page
Toggle Multi-post QuotingQuote Post
pandy
post Aug 9 2020, 12:20 PM
Post #36


Computer says no.
********

Group: WDG Moderators
Posts: 19,110
Joined: 9-August 06
Member No.: 6



I meant in neither of them, the second is just a negation of the first as we said.

\t is a tab and \s is a whitespace character, so the underscore can't hide there either.

Here it was clearly expressed. Even if I don't understand WHY underscore is grouped together with letters and digits they at least let us know that it is.
CODE
\w    Most engines: "word character": ASCII letter, digit or underscore

https://www.rexegg.com/regex-quickstart.html

Maybe I just a have bad book.
User is offlinePM
Go to the top of the page
Toggle Multi-post QuotingQuote Post
pandy
post Aug 9 2020, 12:56 PM
Post #37


Computer says no.
********

Group: WDG Moderators
Posts: 19,110
Joined: 9-August 06
Member No.: 6



QUOTE
Why is an underscore (_) not regarded as a non-word character?


QUOTE
Historical reasons, likely related to the fact, that C Identifiers can consist of letters, numbers and underscore only.


QUOTE
You can still use [\W_] to match all non-word and underscore chars.


https://stackoverflow.com/questions/4953390...-word-character

At last some logic. laugh.gif

And I dissed my book out of confusion. The first explanation of \w and \W that I quoted is from the book and it's clear. My editor's Help isn't. The new version of the Help file doesn't mention underscore either.

CODE
\w        any "word" character
\W        any "non-word" character


That you already know what a "word character" is in regexp is presumed... The Help is actually quite good otherwise, even if terse.
User is offlinePM
Go to the top of the page
Toggle Multi-post QuotingQuote Post
pandy
post Aug 9 2020, 01:17 PM
Post #38


Computer says no.
********

Group: WDG Moderators
Posts: 19,110
Joined: 9-August 06
Member No.: 6



There's actually a whole extra help file for regexp with my editor... The explanation for \w and \W is the same as in the ordinary Help though, but otherwise it seems very detailed.

I happened to read this bit that was just below the bit about \w and \W. It's sort of put-offish.

CODE
For compatibility with Perl, \s did not used to match the VT character (code 11), which made it different from the the POSIX "space" class. However, Perl added VT at release 5.18, and PCRE followed suit at release 8.34. The default \s characters are now HT (9), LF (10), VT (11), FF (12), CR (13), and space (32), which are defined as white space in the "C" locale. This list may vary if locale-specific matching is taking place. For example, in some locales the "non-breaking space" character (\xA0) is recognized as white space, and in others the VT character is not.


God help us. wacko.gif

If locale doesn't have another meaning in this context than otherwise, I don't understand why that should matter one bit! ninja.gif

Talking about locale. One thing that annoys me no end is the decimal point. I have Windows in English but my locale is of course Sweden. I know I can choose what decimal point to use, but I want comma to be the default for obvious reasons. Now I use a book keeping program that has a very nice feature. It's American, so it want period as decimal point when I enter a sum. But it has the good taste to interpret the decimal point key on he number pad as a period, in spite of that my default setting is comma.

That would be a very good feature for any editor, to let the user choose what character the decimal point key on the number pad should produce. Notetab at least has the good taste to accept both comma and period in calculations and I can mix them freely, but when writing other scripts that of course doesn't work.
User is offlinePM
Go to the top of the page
Toggle Multi-post QuotingQuote Post
pandy
post Aug 10 2020, 11:13 AM
Post #39


Computer says no.
********

Group: WDG Moderators
Posts: 19,110
Joined: 9-August 06
Member No.: 6



Here is something both good and bad. Many engines support POSIX character classes. They seem to sort of duplicate some of the the squiggly stuff, but are much more readable.

For example...
[:code:] any letter or digit (same as [a-zA-Z0-9])
[:alpha:] any letter (same as [a-zA-Z][/code]

Bad thing: JavaScript doesn't support POSIX. Good thing: I think PHP does. And so does my editor.

If everything looked like that, at least I would have a lot easier to learn this.
User is offlinePM
Go to the top of the page
Toggle Multi-post QuotingQuote Post

2 Pages V < 1 2
Reply to this topicStart new topic
2 User(s) are reading this topic (2 Guests and 0 Anonymous Users)
0 Members:

 



- Lo-Fi Version Time is now: 20th September 2020 - 03:10 PM