The Web Design Group

... Making the Web accessible to all.

Welcome Guest ( Log In | Register )

 
Reply to this topicStart new topic
> [revised] How to insert a semicolon before every 1st <br>eakline in every <p>aragraphs?, Thank thank everyone
S.mutans
post Sep 30 2016, 11:01 PM
Post #1





Group: Members
Posts: 7
Joined: 27-September 16
Member No.: 24,862




How to insert a semicolon before every 1st breakline in every paragraphs?

Thank thank everyone


This is a dictionary text file.
Every paragraphs are enclosed by
<p class="calibre_11">paragraph1</p>.

I want to add a semicolon
before every 1st breakline
in every paragraphs?

I want to ask about
1. the necessary tools and
2. the method
to make this change:

Find and replace all

CODE
<p class="calibre_11">
<span class="bold">
entry1
</span>
different contents
<br class="calibre1"/> // want to insert a semicolon before the 1st breakline
meaning1
<br class="calibre1"/> // other breaklines remain unchanged
meaning2
<br class="calibre1"/> // other breaklines remain unchanged
meaning3
</p>


to

CODE
<p class="calibre_11">
<span class="bold">
entry1
</span>
different contents
;<br class="calibre1"/> // inserted a semicolon before the 1st breakline  
meaning1
<br class="calibre1"/> // other breaklines remain unchanged
meaning2
<br class="calibre1"/> // other breaklines remain unchanged
meaning3
</p>
User is offlinePM
Go to the top of the page
Toggle Multi-post QuotingQuote Post
Christian J
post Oct 1 2016, 06:27 PM
Post #2


.
********

Group: WDG Moderators
Posts: 9,630
Joined: 10-August 06
Member No.: 7



I haven't looked at the linked 60MB file, but it seems all the HTML is contained in a single large file, which complicates things.

Perhaps you can use regular expressions in a text editor's Find and Replace tool, but I don't know which regular expression to use.

Another much simpler way (for me at least) might be to let a script (e.g. PHP) print out a new version. (Client-side javascript might be used too, if browsers can handle such large files.)
User is offlinePM
Go to the top of the page
Toggle Multi-post QuotingQuote Post
pandy
post Oct 1 2016, 08:59 PM
Post #3


🌟Computer says no🌟
********

Group: WDG Moderators
Posts: 20,716
Joined: 9-August 06
Member No.: 6



Is this what you want?
http://filenurse.com/download/59125c06856c...15f8e6c8eb.html


My regex skills are limited, but luckily my text editor is very capable with text through its internal programming language. It took me half a minute to write the script and then I spent 4 minutes or so watching the screen flicker while it did all the replacements. biggrin.gif

Hope it is what you want and hope you can use it. If you need to to this to more documents I can give you the few lines of code it takes, but you would need to download the editor in question to make use of it.
User is offlinePM
Go to the top of the page
Toggle Multi-post QuotingQuote Post
Christian J
post Oct 2 2016, 07:44 AM
Post #4


.
********

Group: WDG Moderators
Posts: 9,630
Joined: 10-August 06
Member No.: 7



QUOTE(pandy @ Oct 2 2016, 03:59 AM) *

I can give you the few lines of code it takes

I want to see too!
User is offlinePM
Go to the top of the page
Toggle Multi-post QuotingQuote Post
pandy
post Oct 2 2016, 08:58 AM
Post #5


🌟Computer says no🌟
********

Group: WDG Moderators
Posts: 20,716
Joined: 9-August 06
Member No.: 6



Sure. Things like this can be done in a very simplistic way, even if the language is capable of more advanced stuff too. The script below sort of mimics what we would have done if we had done it manually.

CODE
^!Jump text_start
:loop
^!Find "<p class="calibre_11">"
^!IfError end
^!Find "<br class="calibre1"/>"
^!Jump select_start
^!InsertText;
^!Goto loop


Basically, find the P in question, then find the BR in question, insert the semicolon. Loop to find the next P... To avoid screen flicker while this goes on I could have added a line to turn screen update off, but I didn't bother. Then the document would have looked blank until the script had run its course. Now it scrolls and flickers as it's edited. It would also have made the execution a little quicker.

The advantage with the language, as I see it, is that you can put simple things like this together before you master the more complex bits. It's quick to write and it gets the job done. I still do things like this when I just need something once and don't want to put time on it since it's done in a blink this way.
User is offlinePM
Go to the top of the page
Toggle Multi-post QuotingQuote Post
pandy
post Oct 2 2016, 09:02 AM
Post #6


🌟Computer says no🌟
********

Group: WDG Moderators
Posts: 20,716
Joined: 9-August 06
Member No.: 6



I just discovered that the forum tries to correct our code.

This line
CODE
^!InsertText;

should have a space before the semicolon. The forum removes it and not only for display. wacko.gif

I'll test if it happens even outside CODE tags.

^!InsertText ;

CODE
Blah;


Blah ;
User is offlinePM
Go to the top of the page
Toggle Multi-post QuotingQuote Post
pandy
post Oct 2 2016, 09:13 AM
Post #7


🌟Computer says no🌟
********

Group: WDG Moderators
Posts: 20,716
Joined: 9-August 06
Member No.: 6



OK, so just inside CODE. ninja.gif

I see now that I shouldn't have looked for the whole BR tag, but rather just "<br class="calibre1"" in case the BR is sometimes written with a space before the slash. Oh well, lets hope it's consistent or my script failed. I didn't proof read the result. tongue.gif
User is offlinePM
Go to the top of the page
Toggle Multi-post QuotingQuote Post
Christian J
post Oct 2 2016, 09:32 AM
Post #8


.
********

Group: WDG Moderators
Posts: 9,630
Joined: 10-August 06
Member No.: 7



Which text editor is that?

I don't think TextPad lets you write such scripts, instead it has a "Record" function that lets you store your manual Find & Replace operations for e.g. a single semicolon insertion. Then you can play back that same operation to the end of the file automatically.

Alas TextPad doesn't support Unicode characters, which may or may not matter to the OP.
User is offlinePM
Go to the top of the page
Toggle Multi-post QuotingQuote Post
Christian J
post Oct 2 2016, 09:39 AM
Post #9


.
********

Group: WDG Moderators
Posts: 9,630
Joined: 10-August 06
Member No.: 7



QUOTE(pandy @ Oct 2 2016, 04:13 PM) *

OK, so just inside CODE. ninja.gif

This seems like a forum bug, actually.

QUOTE
I see now that I shouldn't have looked for the whole BR tag, but rather just "<br class="calibre1"" in case the BR is sometimes written with a space before the slash.

One might use Regexp to search for zero or more whitespace characters (perhaps also between "br" and "class", and around the "=").

User is offlinePM
Go to the top of the page
Toggle Multi-post QuotingQuote Post
pandy
post Oct 2 2016, 10:20 AM
Post #10


🌟Computer says no🌟
********

Group: WDG Moderators
Posts: 20,716
Joined: 9-August 06
Member No.: 6



But regex isn't needed. Just searching for the right criteria (if we still talk about my script).

I put the rest in a new thread in the OT forum so we don't pollute this thread. wink.gif
User is offlinePM
Go to the top of the page
Toggle Multi-post QuotingQuote Post
S.mutans
post Oct 2 2016, 07:49 PM
Post #11





Group: Members
Posts: 7
Joined: 27-September 16
Member No.: 24,862



Never-mind I have another .ePub file
dictionary .rtf file
dictionary .ePub file
User is offlinePM
Go to the top of the page
Toggle Multi-post QuotingQuote Post
pandy
post Oct 2 2016, 08:02 PM
Post #12


🌟Computer says no🌟
********

Group: WDG Moderators
Posts: 20,716
Joined: 9-August 06
Member No.: 6



So what has changed more than the file format?
User is offlinePM
Go to the top of the page
Toggle Multi-post QuotingQuote Post
pandy
post Oct 4 2016, 05:43 AM
Post #13


🌟Computer says no🌟
********

Group: WDG Moderators
Posts: 20,716
Joined: 9-August 06
Member No.: 6



Also, what's wrong with the file I uploaded? Did I misunderstand what you want?
User is offlinePM
Go to the top of the page
Toggle Multi-post QuotingQuote Post
S.mutans
post Oct 5 2016, 10:18 AM
Post #14





Group: Members
Posts: 7
Joined: 27-September 16
Member No.: 24,862



Sorry, I was nearly unaware of your file.
Were it not for your help, I had to visit psychiatrist.
Thank you Thank you Thank thank you.

This post has been edited by S.mutans: Oct 5 2016, 10:48 AM
User is offlinePM
Go to the top of the page
Toggle Multi-post QuotingQuote Post
pandy
post Oct 5 2016, 03:45 PM
Post #15


🌟Computer says no🌟
********

Group: WDG Moderators
Posts: 20,716
Joined: 9-August 06
Member No.: 6



I take it I got it right then. biggrin.gif
User is offlinePM
Go to the top of the page
Toggle Multi-post QuotingQuote Post
S.mutans
post Oct 5 2016, 10:21 PM
Post #16





Group: Members
Posts: 7
Joined: 27-September 16
Member No.: 24,862



Somebody gave me this java script:

CODE
[].forEach.call(document.querySelectorAll('.calibre_11 br:first-of-type'), function (e) {
  e.insertAdjacentText('beforebegin', ';')
})


I want to change the dictionary.ePub into AnkiDecks .apkg file.
Anki only accept importing .txt file in UTF-8.
Your file not in in UTF-8 doesn’t get right.

It is nothing bad for you to own "Collins Concise German-English Dictionary.apkg".
I can send you a copy after it is done.

This is an example of a valid file:
CODE
entry1; meanings; optional; optional; ...
entry2; meanings; optional; optional; ...
entry3; meanings; optional; optional; ...


This post has been edited by S.mutans: Oct 5 2016, 10:50 PM
User is offlinePM
Go to the top of the page
Toggle Multi-post QuotingQuote Post
S.mutans
post Oct 5 2016, 11:33 PM
Post #17





Group: Members
Posts: 7
Joined: 27-September 16
Member No.: 24,862



There left some <p>aragraphs without any semicolon inside</p>.
Therefore Insert Adjacent semicolon ('afterEnd</p>', ';')
Please help me. How can it be done by myself?
I don’t even know the necessary tools to be downloaded.
Not to say use them.


Replace all the rest

CODE
<p> aragraphs without any semicolon inside </p>


to

CODE
<p> aragraphs without any semicolon inside </p>; // insertAdjacentText('afterEnd', ';')


This post has been edited by S.mutans: Oct 5 2016, 11:50 PM
User is offlinePM
Go to the top of the page
Toggle Multi-post QuotingQuote Post
pandy
post Oct 6 2016, 07:23 AM
Post #18


🌟Computer says no🌟
********

Group: WDG Moderators
Posts: 20,716
Joined: 9-August 06
Member No.: 6



Are any characters currupted in my file? If not, just convert it to UTF-8.

I'm afraid I didn't understand the rest of your question.
User is offlinePM
Go to the top of the page
Toggle Multi-post QuotingQuote Post
Christian J
post Oct 6 2016, 02:21 PM
Post #19


.
********

Group: WDG Moderators
Posts: 9,630
Joined: 10-August 06
Member No.: 7



QUOTE(S.mutans @ Oct 6 2016, 05:21 AM) *

Somebody gave me this java script:

CODE
[].forEach.call(document.querySelectorAll('.calibre_11 br:first-of-type'), function (e) {
  e.insertAdjacentText('beforebegin', ';')
})


Note that any changes made by javascript are just temporary. Also insertAdjacentText is a new feature, with limited browser support.

QUOTE
I want to change the dictionary.ePub into AnkiDecks .apkg file.

Seems https://en.wikipedia.org/wiki/Anki_(software) does use HTML, but I don't know if it supports javascript too?
User is offlinePM
Go to the top of the page
Toggle Multi-post QuotingQuote Post

Reply to this topicStart new topic
1 User(s) are reading this topic (1 Guests and 0 Anonymous Users)
0 Members:

 



- Lo-Fi Version Time is now: 29th March 2024 - 09:33 AM