Printable Version of Topic

Click here to view this topic in its original format

HTMLHelp Forums _ Markup (HTML, XHTML, XML) _ Pattern not working

Posted by: spalisetty Aug 2 2020, 10:24 AM

Hello,
I have written the code to make sure that the number of characters is between 5 to 10. But it is going beyond and when I click on Register, it is going to different page as well. Kindly let me know where I am going wrong.

<!DOCTYPE html>
<html lang="en" dir="ltr">
<head>
<meta charset="utf-8">
<title>Registration form</title>
</head>
<body>
<form class="" action="ThankYou.html" method="Get">
<h1>Registration form</h1>
<h2>All Fields are Mandatory</h2>
<label for="uname">Name</label>
<input id = "uname" type="text" name="username" placeholder="Enter Name" pattern=".{5, 10}" title="Please
enter a name between length 5 and 10" required><br>
<label for="uemail">Email</label>
<input id = "uemail" type="email" name="email" placeholder="Enter Email" required><br>
<label for="uage">Age</label>
<input id = "uage" type="text" name="age" placeholder="Enter Age" required><br>

<h2>Are you interested in Dating?</h2>
<label for="yessir">Yes</label>
<input id = "yessir" type="radio" name="dating" value="Yes">
<label for="nosir">No</label>
<input id = "notesir" type="radio" name="dating" value="No">

<h2>Your expectation like</h2>
<select class="" name="expectation">
<option value="actress1">sumalatha</option>
<option value="actress2">suhasini</option>
<option value="actress3">jayasudha</option>
<option value="actress4">jayaprada</option>
</select>

<h2>How many Marriages you want?</h2>
<select class="" name="numberofmarriages">
<option value="numberofmarr1">1</option>
<option value="numberofmarr2">2</option>
<option value="numberofmarr3">3</option>
<option value="numberofmarr4">4</option>
</select>

<h2>Are You Heavy Alcoholic</h2>
<label for="yesalcohol">Yes</label>
<input id = "yesalcohol" type="checkbox" name="alcohol" value="Yesal">
<label for="noalcohol">No</label>
<input id = "noalcohol" type="checkbox" name="alcohol" value="Noal">
<label for="somealcohol">Some Times</label>
<input id = "somealcohol" type="checkbox" name="alcohol" value="soal">

<h2>Your Preferences and Extra Information</h2>
<<textarea name="box" rows="8" cols="80"></textarea>

<input type="submit" name="submit" value="Register">
</form>
</body>
</html>

Posted by: pandy Aug 2 2020, 11:25 AM

What do you mean it goes to a different page as well? Like a popup? I get to the thankyou page, no more. Are you sure what you show us is all there is?

BTW you have an extra "<" here.

CODE
<<textarea name="box" rows="8" cols="80"></textarea>
^^

Posted by: spalisetty Aug 2 2020, 12:28 PM

QUOTE(spalisetty @ Aug 2 2020, 10:24 AM) *

Hello,
I have written the code to make sure that the number of characters is between 5 to 10. But it is going beyond and when I click on Register, it is going to different page as well. Kindly let me know where I am going wrong.

<!DOCTYPE html>
<html lang="en" dir="ltr">
<head>
<meta charset="utf-8">
<title>Registration form</title>
</head>
<body>
<form class="" action="ThankYou.html" method="Get">
<h1>Registration form</h1>
<h2>All Fields are Mandatory</h2>
<label for="uname">Name</label>
<input id = "uname" type="text" name="username" placeholder="Enter Name" pattern=".{5, 10}" title="Please
enter a name between length 5 and 10" required><br>

<label for="uemail">Email</label>
<input id = "uemail" type="email" name="email" placeholder="Enter Email" required><br>
<label for="uage">Age</label>
<input id = "uage" type="text" name="age" placeholder="Enter Age" required><br>

<h2>Are you interested in Dating?</h2>
<label for="yessir">Yes</label>
<input id = "yessir" type="radio" name="dating" value="Yes">
<label for="nosir">No</label>
<input id = "notesir" type="radio" name="dating" value="No">

<h2>Your expectation like</h2>
<select class="" name="expectation">
<option value="actress1">sumalatha</option>
<option value="actress2">suhasini</option>
<option value="actress3">jayasudha</option>
<option value="actress4">jayaprada</option>
</select>

<h2>How many Marriages you want?</h2>
<select class="" name="numberofmarriages">
<option value="numberofmarr1">1</option>
<option value="numberofmarr2">2</option>
<option value="numberofmarr3">3</option>
<option value="numberofmarr4">4</option>
</select>

<h2>Are You Heavy Alcoholic</h2>
<label for="yesalcohol">Yes</label>
<input id = "yesalcohol" type="checkbox" name="alcohol" value="Yesal">
<label for="noalcohol">No</label>
<input id = "noalcohol" type="checkbox" name="alcohol" value="Noal">
<label for="somealcohol">Some Times</label>
<input id = "somealcohol" type="checkbox" name="alcohol" value="soal">

<h2>Your Preferences and Extra Information</h2>
<<textarea name="box" rows="8" cols="80"></textarea>

<input type="submit" name="submit" value="Register">
</form>
</body>
</html>




Thank You for the reply. sorry for not explaining clearly. In the code, I highlighted, we see I have pattern. it says the length should be between 5 to 10 max. Even though I entered more than 10 alphanumerics, when I click on submit, it is going to ThankYou.html page. HTML validation is not working as expected. Kindy find my mistake.





Posted by: Christian J Aug 2 2020, 01:18 PM

Seems the space is causing problems:

CODE
.{5, 10}

while this works:

CODE
.{5,10}

(But do you really want to allow any kind of character in a name, including whitespaces?)

Also keep in mind that older browsers don't support the PATTERN attribute, so make sure to validate the form data in the server-side script as well.



Posted by: pandy Aug 2 2020, 01:30 PM

Oh, now I understand. Are you sure the regex is correct? Looks right to me, but I don't really know regex.

Seems to differ what browsers do with this. Firefox marks an input that isn't correct with a red border and refuses to submit the page. But it doesn't do that with your name input, no border, so something must be wrong with the syntax I guess.

Sorry I can't be of more help.

Posted by: spalisetty Aug 2 2020, 01:38 PM

Seems the space is causing problems:

CODE
.{5, 10}

while this works:

CODE
.{5,10}

(But do you really want to allow any kind of character in a name, including whitespaces?)

Also keep in mind that older browsers don't support the PATTERN attribute, so make sure to validate the form data in the server-side script as well.
[/quote]

Perfect. Thank you so much. I am learning to create a static website.

Posted by: spalisetty Aug 2 2020, 01:38 PM

QUOTE(pandy @ Aug 2 2020, 01:30 PM) *

Oh, now I understand. Are you sure the rexex is correct? Looks right to me, but I don't really no regex.

Seems to differ what browsers do with this. Firefox marks an input that isn't correct with a red border and refuses to submit the page. But it doesn't do that with your name input, no border, so something must be wrong with the syntax I guess.

Sorry I can't be of more help.

Thank You so much for trying to help me. Christian helped me.

Posted by: pandy Aug 2 2020, 02:20 PM

QUOTE(Christian J @ Aug 2 2020, 08:18 PM) *

Seems the space is causing problems:


Duh. Thought I tried that, but obviously I didn't. blush.gif

Are you good at this? Do you know how to match only letters but ALL letters? All examples I find are using a-z.

Posted by: Christian J Aug 2 2020, 02:29 PM

QUOTE(pandy @ Aug 2 2020, 09:20 PM) *

Are you good as this?

No. wacko.gif

QUOTE
Do you know how to match only letters but ALL letters? All examples I find are using a-z.

You mean all existing alphabets? I don't think there's any easy way for that. If the Unicode range of alphabetical characters are in the same range, maybe it's possible to use that as delimiter? Or maybe you could try matching all characters, except numbers, whitespace and special characters? Or just make a list of all letters from all alphabets. unsure.gif


Posted by: pandy Aug 2 2020, 04:43 PM

Nothing specific, but more than ANSI anyway. Say I want at least iso-latin. I don't understand how it should be written. Or, if the encoding is UTF-8, why not all letters covered by that?

I've meant to learn several times, even bought a book once. But I procrastinate. Thing is that my searching needs aren't that big. I would use regexp so seldom I would forget and have to read up all the time. And I expect the initial learning curve to be steep. Notetab handles all my searching needs even without regexp and I can write a script for it in no time if needed. So it's hard to feel motivated to learn something that consists of squiggly illegible characters and probably takes forever to get one's head around... Would be for the coolness factor, I guess. tongue.gif

Posted by: Christian J Aug 2 2020, 05:47 PM

To me it seems Regex is a power tool for those using it all the time, so that they will actually remember how the syntax works. If you're just going to use it once a year or less it's probably not worth racking your brain. mellow.gif


Posted by: pandy Aug 2 2020, 08:28 PM

QUOTE(Christian J @ Aug 3 2020, 12:47 AM) *

To me it seems Regex is a power tool for those using it all the time, so that they will actually remember how the syntax works. If you're just going to use it once a year or less it's probably not worth racking your brain. mellow.gif


Exactly, that's what I meant! For me it's better to use Notetabs built in functions. Even if what I write can be 10 lines or more while 1 would be enough had I used regexp, it's quicker to write (for me) and I understand what I wrote a year later. Only problem is it isn't portable. I can't stick that into pattern. And that is a little annoying. tongue.gif

Posted by: pandy Aug 2 2020, 09:06 PM

Add o that there are oodles of flavours of regexp. I tried spalisetty's exrpessiong in Notetab. Didn't find a thing. I think Notetab uses PCRE which obviously differs form the JS flavour. So it isn't like learn it once and you can use it anywhere.

Posted by: pandy Aug 2 2020, 09:06 PM

Add to that there are oodles of flavours of regexp. I tried spalisetty's exrpession in Notetab. Didn't find a thing. I think Notetab uses PCRE which obviously differs form the JS flavour. So it isn't like learn it once and you can use it anywhere.

Posted by: pandy Aug 3 2020, 07:59 AM

From https://stackoverflow.com/questions/150033/regular-expression-to-match-non-ascii-characters .

I broke the line here or this would have scrolled to China.

QUOTE

var words_in_text = function (text) {
var regex = /([\u0041-\u005A\u0061-\u007A\u00AA\u00B5\u00BA\u00C0-\u00D6\u00D8-\u00F6\u00F8-\u02C1\u02C6-
\u02D1\u02E0-\u02E4\u02EC\u02EE\u0370-\u0374\u0376\u0377\u037A-\u037D\u0386\u0388-\u038A\u038C\u038E-
\u03A1\u03A3-\u03F5\u03F7-\u0481\u048A-\u0527\u0531-\u0556\u0559\u0561-\u0587\u05D0-\u05EA\u05F0-
\u05F2\u0620-\u064A\u066E\u066F\u0671-\u06D3\u06D5\u06E5\u06E6\u06EE\u06EF\u06FA-\u06FC\u06FF\u0710\u0712-
\u072F\u074D-\u07A5\u07B1\u07CA-\u07EA\u07F4\u07F5\u07FA\u0800-\u0815\u081A\u0824\u0828\u0840-
\u0858\u08A0\u08A2-\u08AC\u0904-\u0939\u093D\u0950\u0958-\u0961\u0971-\u0977\u0979-\u097F\u0985-
\u098C\u098F\u0990\u0993-\u09A8\u09AA-\u09B0\u09B2\u09B6-\u09B9\u09BD\u09CE\u09DC\u09DD\u09DF-
\u09E1\u09F0\u09F1\u0A05-\u0A0A\u0A0F\u0A10\u0A13-\u0A28\u0A2A-
\u0A30\u0A32\u0A33\u0A35\u0A36\u0A38\u0A39\u0A59-\u0A5C\u0A5E\u0A72-\u0A74\u0A85-\u0A8D\u0A8F-\u0A91\u0A93-
\u0AA8\u0AAA-\u0AB0\u0AB2\u0AB3\u0AB5-\u0AB9\u0ABD\u0AD0\u0AE0\u0AE1\u0B05-\u0B0C\u0B0F\u0B10\u0B13-
\u0B28\u0B2A-\u0B30\u0B32\u0B33\u0B35-\u0B39\u0B3D\u0B5C\u0B5D\u0B5F-\u0B61\u0B71\u0B83\u0B85-\u0B8A\u0B8E-
\u0B90\u0B92-\u0B95\u0B99\u0B9A\u0B9C\u0B9E\u0B9F\u0BA3\u0BA4\u0BA8-\u0BAA\u0BAE-\u0BB9\u0BD0\u0C05-
\u0C0C\u0C0E-\u0C10\u0C12-\u0C28\u0C2A-\u0C33\u0C35-\u0C39\u0C3D\u0C58\u0C59\u0C60\u0C61\u0C85-\u0C8C\u0C8E-
\u0C90\u0C92-\u0CA8\u0CAA-\u0CB3\u0CB5-\u0CB9\u0CBD\u0CDE\u0CE0\u0CE1\u0CF1\u0CF2\u0D05-\u0D0C\u0D0E-
\u0D10\u0D12-\u0D3A\u0D3D\u0D4E\u0D60\u0D61\u0D7A-\u0D7F\u0D85-\u0D96\u0D9A-\u0DB1\u0DB3-\u0DBB\u0DBD\u0DC0-
\u0DC6\u0E01-\u0E30\u0E32\u0E33\u0E40-\u0E46\u0E81\u0E82\u0E84\u0E87\u0E88\u0E8A\u0E8D\u0E94-\u0E97\u0E99-
\u0E9F\u0EA1-\u0EA3\u0EA5\u0EA7\u0EAA\u0EAB\u0EAD-\u0EB0\u0EB2\u0EB3\u0EBD\u0EC0-\u0EC4\u0EC6\u0EDC-
\u0EDF\u0F00\u0F40-\u0F47\u0F49-\u0F6C\u0F88-\u0F8C\u1000-\u102A\u103F\u1050-\u1055\u105A-
\u105D\u1061\u1065\u1066\u106E-\u1070\u1075-\u1081\u108E\u10A0-\u10C5\u10C7\u10CD\u10D0-\u10FA\u10FC-
\u1248\u124A-\u124D\u1250-\u1256\u1258\u125A-\u125D\u1260-\u1288\u128A-\u128D\u1290-\u12B0\u12B2-
\u12B5\u12B8-\u12BE\u12C0\u12C2-\u12C5\u12C8-\u12D6\u12D8-\u1310\u1312-\u1315\u1318-\u135A\u1380-
\u138F\u13A0-\u13F4\u1401-\u166C\u166F-\u167F\u1681-\u169A\u16A0-\u16EA\u1700-\u170C\u170E-\u1711\u1720-
\u1731\u1740-\u1751\u1760-\u176C\u176E-\u1770\u1780-\u17B3\u17D7\u17DC\u1820-\u1877\u1880-\u18A8\u18AA\u18B0-
\u18F5\u1900-\u191C\u1950-\u196D\u1970-\u1974\u1980-\u19AB\u19C1-\u19C7\u1A00-\u1A16\u1A20-
\u1A54\u1AA7\u1B05-\u1B33\u1B45-\u1B4B\u1B83-\u1BA0\u1BAE\u1BAF\u1BBA-\u1BE5\u1C00-\u1C23\u1C4D-\u1C4F\u1C5A-
\u1C7D\u1CE9-\u1CEC\u1CEE-\u1CF1\u1CF5\u1CF6\u1D00-\u1DBF\u1E00-\u1F15\u1F18-\u1F1D\u1F20-\u1F45\u1F48-
\u1F4D\u1F50-\u1F57\u1F59\u1F5B\u1F5D\u1F5F-\u1F7D\u1F80-\u1FB4\u1FB6-\u1FBC\u1FBE\u1FC2-\u1FC4\u1FC6-
\u1FCC\u1FD0-\u1FD3\u1FD6-\u1FDB\u1FE0-\u1FEC\u1FF2-\u1FF4\u1FF6-\u1FFC\u2071\u207F\u2090-
\u209C\u2102\u2107\u210A-\u2113\u2115\u2119-\u211D\u2124\u2126\u2128\u212A-\u212D\u212F-\u2139\u213C-
\u213F\u2145-\u2149\u214E\u2183\u2184\u2C00-\u2C2E\u2C30-\u2C5E\u2C60-\u2CE4\u2CEB-\u2CEE\u2CF2\u2CF3\u2D00-
\u2D25\u2D27\u2D2D\u2D30-\u2D67\u2D6F\u2D80-\u2D96\u2DA0-\u2DA6\u2DA8-\u2DAE\u2DB0-\u2DB6\u2DB8-\u2DBE\u2DC0-
\u2DC6\u2DC8-\u2DCE\u2DD0-\u2DD6\u2DD8-\u2DDE\u2E2F\u3005\u3006\u3031-\u3035\u303B\u303C\u3041-\u3096\u309D-
\u309F\u30A1-\u30FA\u30FC-\u30FF\u3105-\u312D\u3131-\u318E\u31A0-\u31BA\u31F0-\u31FF\u3400-\u4DB5\u4E00-
\u9FCC\uA000-\uA48C\uA4D0-\uA4FD\uA500-\uA60C\uA610-\uA61F\uA62A\uA62B\uA640-\uA66E\uA67F-\uA697\uA6A0-
\uA6E5\uA717-\uA71F\uA722-\uA788\uA78B-\uA78E\uA790-\uA793\uA7A0-\uA7AA\uA7F8-\uA801\uA803-\uA805\uA807-
\uA80A\uA80C-\uA822\uA840-\uA873\uA882-\uA8B3\uA8F2-\uA8F7\uA8FB\uA90A-\uA925\uA930-\uA946\uA960-
\uA97C\uA984-\uA9B2\uA9CF\uAA00-\uAA28\uAA40-\uAA42\uAA44-\uAA4B\uAA60-\uAA76\uAA7A\uAA80-
\uAAAF\uAAB1\uAAB5\uAAB6\uAAB9-\uAABD\uAAC0\uAAC2\uAADB-\uAADD\uAAE0-\uAAEA\uAAF2-\uAAF4\uAB01-\uAB06\uAB09-
\uAB0E\uAB11-\uAB16\uAB20-\uAB26\uAB28-\uAB2E\uABC0-\uABE2\uAC00-\uD7A3\uD7B0-\uD7C6\uD7CB-\uD7FB\uF900-
\uFA6D\uFA70-\uFAD9\uFB00-\uFB06\uFB13-\uFB17\uFB1D\uFB1F-\uFB28\uFB2A-\uFB36\uFB38-
\uFB3C\uFB3E\uFB40\uFB41\uFB43\uFB44\uFB46-\uFBB1\uFBD3-\uFD3D\uFD50-\uFD8F\uFD92-\uFDC7\uFDF0-\uFDFB\uFE70-
\uFE74\uFE76-\uFEFC\uFF21-\uFF3A\uFF41-\uFF5A\uFF66-\uFFBE\uFFC2-\uFFC7\uFFCA-\uFFCF\uFFD2-\uFFD7\uFFDA-
\uFF
return text.match(regex);
};

words_in_text('Düsseldorf, Köln, ??????, ???, ??????? !@#$');

// returns array ["Düsseldorf", "Köln", "??????", "???", "???????"]



QUOTE
The situation with regexes, Unicode, and Javascript sucks. It's ridiculous that programmers should have to rely on external libraries to recognize that "Αλφα" is a word, or even that "é" is a letter.

But so it goes.

This guy has written a good library for handling Unicode in Javascript Regexes:

http://blog.stevenlevithan.com/archives/javascript-regex-and-unicode

The Unicode stuff is a plugin to this regex library:

http://xregexp.com/

Here's a post about the Unicode extension:

http://blog.stevenlevithan.com/archives/xregexp-unicode-plugin

And the extension page itself:

http://xregexp.com/plugins/

Great work but it still bums me out that Javascript is so backwards in this regard.

(He wrote a book for O'Reilly about the topic so it's quite possible that he knows what he's talking about.)

The way he implemented it is by adding tables of characters with certain properties. Then, when you contruct a regex with his library, \p{charclass} gets replaced with [allthecharactersintheclass].


Get that into pattern! biggrin.gif biggrin.gif biggrin.gif

I must say that I find using regexp for pattern is rather silly and pointless if this is the situation. tongue.gif

Posted by: Christian J Aug 3 2020, 12:30 PM

QUOTE(pandy @ Aug 3 2020, 02:59 PM) *

Get that into pattern! biggrin.gif biggrin.gif biggrin.gif

wacko.gif

QUOTE
I must say that I find using regexp for pattern is rather silly and pointless if this is the situation. tongue.gif

Especially considering that most websites were already using javascript form validation before the PATTERN attribute was created.

I'm still happy that PATTERN doesn't rely on javascript, and that it uses the browser's own standardized error message on every website. What's missing is a larger selection of readymade regular expressions, though http://html5pattern.com/ has a few.

Posted by: pandy Aug 3 2020, 03:36 PM

Are you sure it doesn't rely on JavaScript? What else could evaluate the regexp?

Posted by: Christian J Aug 3 2020, 04:25 PM

QUOTE(pandy @ Aug 3 2020, 10:36 PM) *

Are you sure it doesn't rely on JavaScript?

Of course, check with the OP's example. smile.gif

QUOTE
What else could evaluate the regexp?

The browser engine? Don't know if different browsers use their JS interpreter for this or something else, but it seems to work even if JS is disabled.

Posted by: pandy Aug 3 2020, 05:31 PM

I know, but disabling JS probably just means scripts on the page won't run. I think the browser still uses it for its internal needs. The JS flavour of regexp is used in pattern. I doubt they'd build two regexp engines into the browser just to avoid using the JS one.

I bet you that if you can find such a thing as a modern browser without JS support this won't work in it. tongue.gif

Posted by: pandy Aug 3 2020, 08:39 PM

And here I found one we can use if we want to be sure a proper email address is entered. biggrin.gif

https://blog.codinghorror.com/regex-use-vs-regex-abuse/ (at the bottom)

Posted by: pandy Aug 3 2020, 08:48 PM

This actually made me find my regexp book and flickering thought it I landed on Lesson 4 where I among other things can learn how to match digits and non digits.

\d matches a digit
\D matches a non digit

Alas \D also includes whitespace and other non digit characters.

But...
\s matches any whitespace character
\S matches any non whitespace character

I'm sure if I read on I'll learn how to combine these into something quite sensible. Maybe this stuff isn't so bad after all. wub.gif

Posted by: Christian J Aug 4 2020, 07:32 AM

QUOTE(pandy @ Aug 4 2020, 03:39 AM) *

And here I found one we can use if we want to be sure a proper email address is entered. biggrin.gif

https://blog.codinghorror.com/regex-use-vs-regex-abuse/ (at the bottom)

Assuming that the regex itself is correct. I'm not going to entangle it. blink.gif

Much easier to let the browser developers do the work:

CODE
<input type="email">

but apparently HTML5 relies on a simpler definition of a https://html.spec.whatwg.org/multipage/input.html#valid-e-mail-address:

QUOTE
The following JavaScript- and Perl-compatible regular expression is an implementation of the above definition.
/^[a-zA-Z0-9.!#$%&'*+\/=?^_`{|}~-]+@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*$/

And once the form is submitted, do we need to use email validation on the server-side as well (in addition to sanitation, which must always be used)? If so I guess it should follow the exact same validation rules, to avoid bugs, but I have no idea if that's the case with e.g. https://www.php.net/manual/en/filter.filters.validate.php. FWIW, https://www.php.net/manual/en/filter.filters.validate.php#102398 says that it's too strict for Internationalized domain names...



Posted by: Darin McGrew Aug 4 2020, 08:24 AM

QUOTE
This actually made me find my regexp book

I think that's the key for regular expressions, if you're not using them all the time. Get familiar enough to know what kinds of things they can do, and then look them up when you need the details. Especially look them up because the details of different implementations of regular expressions vary.

Posted by: pandy Aug 4 2020, 10:39 AM

Yeah, different flavours are a stumble stone for me. If it was a matter of learn it once I might consider. Besides, there's nothing easier than to get people to write an regexp for you. It's Christmas for those guys when someone asks for help and you have at least 5 variants in no time. biggrin.gif
(Sheesh. Now I sound like a "Gimme the code guy". blush.gif)

But I'm a little inspired now. I think I'll read the book. A chapter a day I could maybe handle. It could be handy to be able to do simple stuff and understand a little more of expressions I see and maybe use and also to be able change them a little. But I'll never be good as this as I have little practical use for it.

Posted by: Christian J Aug 4 2020, 11:03 AM

You could always add to the library of PATTERN attribute regexes for practice...

Posted by: pandy Aug 4 2020, 01:47 PM

I don't think my attempts should be trusted. biggrin.gif

Posted by: Christian J Aug 5 2020, 05:45 PM

Nothing to worry about, considering the constant patching of all kinds of software one obviously can't trust anyone else either!

Posted by: pandy Aug 5 2020, 06:31 PM

You should write one. rolleyes.gif

Posted by: pandy Aug 5 2020, 07:42 PM

Or we just leave it to Darin. laugh.gif

Posted by: pandy Aug 8 2020, 01:03 AM

I'm too fast. I'm on lesson 4 already. I won't remember.

Posted by: Christian J Aug 8 2020, 10:12 AM

Out of how many lessons? :-p

Here's a challenge for you: make a tool in e.g. javascript that explaines in clear text what a user-submitted regex actually does (similar to that site, "CSS-explain" was it?).

A tool that creates a regex from various predefined cleartext alternatives might be very useful too, but I suspect the UI will be daunting.

Posted by: pandy Aug 8 2020, 10:04 PM

10. But I expect the remaining chapters to take longer...

In lesson 4 I learnt this

CODE
\w matches any alphanumeric character in upper or lower case and underscore
\W what isn't matched by the above


Sort of handy, except I don't understand why underscore is lumped together with alphanumeric characters, but I'm sure there is a reason, probably to do with some programming language or *nix . Then I tried it in Notetab. Turnes out there it's the other way around! \W matches alphanumeric and underscore. laugh.gif

Now I use an older version of NTB because I had an accident with the upgraded version and haven't bothered to fix it. It changed regex enginge at some point and now uses PCRE. Don't know what it used before, something obscure probably since it isn't mentioned in Help. I'm sure this will work as expected once I get around to install the new version (that I have now got around to obtain from the company).

Strange is that in NTB Help the description of \w and \W is something that I don't even understand but at first glance looks like something different.

CODE
\w   any word delimiter. Matches any of \t\s!"&()*+,-./:;<=>?@[\]^`{|}~
\W   any nonword delimiter. Equivalent to [^\t\s!"&()*+,-./:;<=>?@[\]^`{|}~]


Maybe that means alhpanumeric and underscore and the opposite, I don't know. I get the meaning of "word delimiter", but I don't see how underscore fits in with "nonword delimiter", which should mean alphanumeric characters, and it is matches together with alphanumeric characters in NTB too. wacko.gif

Posted by: Christian J Aug 9 2020, 07:14 AM

QUOTE(pandy @ Aug 9 2020, 05:04 AM) *

In lesson 4 I learnt this
CODE
\w matches any alphanumeric character in upper or lower case and underscore
\W what isn't matched by the above


According to http://www.javascriptkit.com/javatutors/redev2.shtml :

CODE
\w matches any alphanumerical character (word characters) including underscore (short for [a-zA-Z0-9_]).
\W matches any non-word characters (short for  [^a-zA-Z0-9_]).

Just A-Z as usual, in other word.

QUOTE
Strange is that in NTB Help the description of \w and \W is something that I don't even understand but at first glance looks like something different.
CODE
\w   any word delimiter. Matches any of \t\s!"&()*+,-./:;<=>?@[\]^`{|}~  

That looks like any character you can use between words (such as tabs, spaces, commas etc).

QUOTE
CODE
\W   any nonword delimiter. Equivalent to [^\t\s!"&()*+,-./:;<=>?@[\]^`{|}~]

That looks like the negation of the above. But the description "any nonword delimiter" seems misleading, shouldn't it be spelled "any non word delimiter"? ("Nonword" could mean anything that's not a word. )

Posted by: pandy Aug 9 2020, 07:36 AM

QUOTE(Christian J @ Aug 9 2020, 02:14 PM) *

That looks like any character you can use between words (such as tabs, spaces, commas etc).

Yes, but in reality it matches any non alphanumeric character except new line, just as the description in the book, only it has \w and \W the other way around.

QUOTE
CODE
\W   any nonword delimiter. Equivalent to [^\t\s!"&()*+,-./:;<=>?@[\]^`{|}~]

That looks like the negation of the above. But the description "any nonword delimiter" seems misleading, shouldn't it be spelled "any non word delimiter"? ("Nonword" could mean anything that's not a word. )


Yes, that I understood, it's just a negation of the first. But I still don't see how the underscore fits in. The author is Swiss, that may explain possible English mistakes. tongue.gif

I finally got around to installing an updated version. Now it works as in the book. But I think there are lots of differences between flavours, even if hopefully not as stupid as this one. That old engine is probably more than 20 years old. One would hope they strive to make them more compatible.

Posted by: Christian J Aug 9 2020, 12:08 PM

QUOTE(pandy @ Aug 9 2020, 02:36 PM) *

But I still don't see how the underscore fits in.

Actually none of them contain underscores:

CODE
\w   any word delimiter. Matches any of \t\s!"&()*+,-./:;<=>?@[\]^`{|}~
\W   any nonword delimiter. Equivalent to [^\t\s!"&()*+,-./:;<=>?@[\]^`{|}~]

unsure.gif

Posted by: pandy Aug 9 2020, 12:20 PM

I meant in neither of them, the second is just a negation of the first as we said.

\t is a tab and \s is a whitespace character, so the underscore can't hide there either.

Here it was clearly expressed. Even if I don't understand WHY underscore is grouped together with letters and digits they at least let us know that it is.

CODE
\w    Most engines: "word character": ASCII letter, digit or underscore

https://www.rexegg.com/regex-quickstart.html

Maybe I just a have bad book.

Posted by: pandy Aug 9 2020, 12:56 PM

QUOTE
Why is an underscore (_) not regarded as a non-word character?


QUOTE
Historical reasons, likely related to the fact, that C Identifiers can consist of letters, numbers and underscore only.


QUOTE
You can still use [\W_] to match all non-word and underscore chars.


https://stackoverflow.com/questions/49533901/why-is-an-underscore-not-regarded-as-a-non-word-character

At last some logic. laugh.gif

And I dissed my book out of confusion. The first explanation of \w and \W that I quoted is from the book and it's clear. My editor's Help isn't. The new version of the Help file doesn't mention underscore either.

CODE
\w        any "word" character
\W        any "non-word" character


That you already know what a "word character" is in regexp is presumed... The Help is actually quite good otherwise, even if terse.

Posted by: pandy Aug 9 2020, 01:17 PM

There's actually a whole extra help file for regexp with my editor... The explanation for \w and \W is the same as in the ordinary Help though, but otherwise it seems very detailed.

I happened to read this bit that was just below the bit about \w and \W. It's sort of put-offish.

CODE
For compatibility with Perl, \s did not used to match the VT character (code 11), which made it different from the the POSIX "space" class. However, Perl added VT at release 5.18, and PCRE followed suit at release 8.34. The default \s characters are now HT (9), LF (10), VT (11), FF (12), CR (13), and space (32), which are defined as white space in the "C" locale. This list may vary if locale-specific matching is taking place. For example, in some locales the "non-breaking space" character (\xA0) is recognized as white space, and in others the VT character is not.


God help us. wacko.gif

If locale doesn't have another meaning in this context than otherwise, I don't understand why that should matter one bit! ninja.gif

Talking about locale. One thing that annoys me no end is the decimal point. I have Windows in English but my locale is of course Sweden. I know I can choose what decimal point to use, but I want comma to be the default for obvious reasons. Now I use a book keeping program that has a very nice feature. It's American, so it want period as decimal point when I enter a sum. But it has the good taste to interpret the decimal point key on he number pad as a period, in spite of that my default setting is comma.

That would be a very good feature for any editor, to let the user choose what character the decimal point key on the number pad should produce. Notetab at least has the good taste to accept both comma and period in calculations and I can mix them freely, but when writing other scripts that of course doesn't work.

Posted by: pandy Aug 10 2020, 11:13 AM

Here is something both good and bad. Many engines support POSIX character classes. They seem to sort of duplicate some of the the squiggly stuff, but are much more readable.

For example...
[:code:] any letter or digit (same as [a-zA-Z0-9])
[:alpha:] any letter (same as [a-zA-Z][/code]

Bad thing: JavaScript doesn't support POSIX. Good thing: I think PHP does. And so does my editor.

If everything looked like that, at least I would have a lot easier to learn this.

Powered by Invision Power Board (http://www.invisionboard.com)
© Invision Power Services (http://www.invisionpower.com)