[revised] How to insert a semicolon before every 1st <br>eakline in every <p>aragraphs?, Thank thank everyone |
[revised] How to insert a semicolon before every 1st <br>eakline in every <p>aragraphs?, Thank thank everyone |
S.mutans |
Sep 30 2016, 11:01 PM
Post
#1
|
Group: Members Posts: 7 Joined: 27-September 16 Member No.: 24,862 |
How to insert a semicolon before every 1st breakline in every paragraphs? Thank thank everyone This is a dictionary text file. Every paragraphs are enclosed by <p class="calibre_11">paragraph1</p>. I want to add a semicolon before every 1st breakline in every paragraphs? I want to ask about 1. the necessary tools and 2. the method to make this change: Find and replace all CODE <p class="calibre_11"> <span class="bold"> entry1 </span> different contents <br class="calibre1"/> // want to insert a semicolon before the 1st breakline meaning1 <br class="calibre1"/> // other breaklines remain unchanged meaning2 <br class="calibre1"/> // other breaklines remain unchanged meaning3 </p> to CODE <p class="calibre_11"> <span class="bold"> entry1 </span> different contents ;<br class="calibre1"/> // inserted a semicolon before the 1st breakline meaning1 <br class="calibre1"/> // other breaklines remain unchanged meaning2 <br class="calibre1"/> // other breaklines remain unchanged meaning3 </p> |
Christian J |
Oct 1 2016, 06:27 PM
Post
#2
|
. Group: WDG Moderators Posts: 9,653 Joined: 10-August 06 Member No.: 7 |
I haven't looked at the linked 60MB file, but it seems all the HTML is contained in a single large file, which complicates things.
Perhaps you can use regular expressions in a text editor's Find and Replace tool, but I don't know which regular expression to use. Another much simpler way (for me at least) might be to let a script (e.g. PHP) print out a new version. (Client-side javascript might be used too, if browsers can handle such large files.) |
pandy |
Oct 1 2016, 08:59 PM
Post
#3
|
🌟Computer says no🌟 Group: WDG Moderators Posts: 20,730 Joined: 9-August 06 Member No.: 6 |
Is this what you want?
http://filenurse.com/download/59125c06856c...15f8e6c8eb.html My regex skills are limited, but luckily my text editor is very capable with text through its internal programming language. It took me half a minute to write the script and then I spent 4 minutes or so watching the screen flicker while it did all the replacements. Hope it is what you want and hope you can use it. If you need to to this to more documents I can give you the few lines of code it takes, but you would need to download the editor in question to make use of it. |
Christian J |
Oct 2 2016, 07:44 AM
Post
#4
|
. Group: WDG Moderators Posts: 9,653 Joined: 10-August 06 Member No.: 7 |
|
pandy |
Oct 2 2016, 08:58 AM
Post
#5
|
🌟Computer says no🌟 Group: WDG Moderators Posts: 20,730 Joined: 9-August 06 Member No.: 6 |
Sure. Things like this can be done in a very simplistic way, even if the language is capable of more advanced stuff too. The script below sort of mimics what we would have done if we had done it manually.
CODE ^!Jump text_start :loop ^!Find "<p class="calibre_11">" ^!IfError end ^!Find "<br class="calibre1"/>" ^!Jump select_start ^!InsertText; ^!Goto loop Basically, find the P in question, then find the BR in question, insert the semicolon. Loop to find the next P... To avoid screen flicker while this goes on I could have added a line to turn screen update off, but I didn't bother. Then the document would have looked blank until the script had run its course. Now it scrolls and flickers as it's edited. It would also have made the execution a little quicker. The advantage with the language, as I see it, is that you can put simple things like this together before you master the more complex bits. It's quick to write and it gets the job done. I still do things like this when I just need something once and don't want to put time on it since it's done in a blink this way. |
pandy |
Oct 2 2016, 09:02 AM
Post
#6
|
🌟Computer says no🌟 Group: WDG Moderators Posts: 20,730 Joined: 9-August 06 Member No.: 6 |
I just discovered that the forum tries to correct our code.
This line CODE ^!InsertText; should have a space before the semicolon. The forum removes it and not only for display. I'll test if it happens even outside CODE tags. ^!InsertText ; CODE Blah; Blah ; |
pandy |
Oct 2 2016, 09:13 AM
Post
#7
|
🌟Computer says no🌟 Group: WDG Moderators Posts: 20,730 Joined: 9-August 06 Member No.: 6 |
OK, so just inside CODE.
I see now that I shouldn't have looked for the whole BR tag, but rather just "<br class="calibre1"" in case the BR is sometimes written with a space before the slash. Oh well, lets hope it's consistent or my script failed. I didn't proof read the result. |
Christian J |
Oct 2 2016, 09:32 AM
Post
#8
|
. Group: WDG Moderators Posts: 9,653 Joined: 10-August 06 Member No.: 7 |
Which text editor is that?
I don't think TextPad lets you write such scripts, instead it has a "Record" function that lets you store your manual Find & Replace operations for e.g. a single semicolon insertion. Then you can play back that same operation to the end of the file automatically. Alas TextPad doesn't support Unicode characters, which may or may not matter to the OP. |
Christian J |
Oct 2 2016, 09:39 AM
Post
#9
|
. Group: WDG Moderators Posts: 9,653 Joined: 10-August 06 Member No.: 7 |
OK, so just inside CODE. This seems like a forum bug, actually. QUOTE I see now that I shouldn't have looked for the whole BR tag, but rather just "<br class="calibre1"" in case the BR is sometimes written with a space before the slash. One might use Regexp to search for zero or more whitespace characters (perhaps also between "br" and "class", and around the "="). |
pandy |
Oct 2 2016, 10:20 AM
Post
#10
|
🌟Computer says no🌟 Group: WDG Moderators Posts: 20,730 Joined: 9-August 06 Member No.: 6 |
But regex isn't needed. Just searching for the right criteria (if we still talk about my script).
I put the rest in a new thread in the OT forum so we don't pollute this thread. |
S.mutans |
Oct 2 2016, 07:49 PM
Post
#11
|
Group: Members Posts: 7 Joined: 27-September 16 Member No.: 24,862 |
|
pandy |
Oct 2 2016, 08:02 PM
Post
#12
|
🌟Computer says no🌟 Group: WDG Moderators Posts: 20,730 Joined: 9-August 06 Member No.: 6 |
So what has changed more than the file format?
|
pandy |
Oct 4 2016, 05:43 AM
Post
#13
|
🌟Computer says no🌟 Group: WDG Moderators Posts: 20,730 Joined: 9-August 06 Member No.: 6 |
Also, what's wrong with the file I uploaded? Did I misunderstand what you want?
|
S.mutans |
Oct 5 2016, 10:18 AM
Post
#14
|
Group: Members Posts: 7 Joined: 27-September 16 Member No.: 24,862 |
Sorry, I was nearly unaware of your file.
Were it not for your help, I had to visit psychiatrist. Thank you Thank you Thank thank you. This post has been edited by S.mutans: Oct 5 2016, 10:48 AM |
pandy |
Oct 5 2016, 03:45 PM
Post
#15
|
🌟Computer says no🌟 Group: WDG Moderators Posts: 20,730 Joined: 9-August 06 Member No.: 6 |
I take it I got it right then.
|
S.mutans |
Oct 5 2016, 10:21 PM
Post
#16
|
Group: Members Posts: 7 Joined: 27-September 16 Member No.: 24,862 |
Somebody gave me this java script:
CODE [].forEach.call(document.querySelectorAll('.calibre_11 br:first-of-type'), function (e) { e.insertAdjacentText('beforebegin', ';') }) I want to change the dictionary.ePub into AnkiDecks .apkg file. Anki only accept importing .txt file in UTF-8. Your file not in in UTF-8 doesn’t get right. It is nothing bad for you to own "Collins Concise German-English Dictionary.apkg". I can send you a copy after it is done. This is an example of a valid file: CODE entry1; meanings; optional; optional; ... entry2; meanings; optional; optional; ... entry3; meanings; optional; optional; ... This post has been edited by S.mutans: Oct 5 2016, 10:50 PM |
S.mutans |
Oct 5 2016, 11:33 PM
Post
#17
|
Group: Members Posts: 7 Joined: 27-September 16 Member No.: 24,862 |
There left some <p>aragraphs without any semicolon inside</p>.
Therefore Insert Adjacent semicolon ('afterEnd</p>', ';') Please help me. How can it be done by myself? I don’t even know the necessary tools to be downloaded. Not to say use them. Replace all the rest CODE <p> aragraphs without any semicolon inside </p> to CODE <p> aragraphs without any semicolon inside </p>; // insertAdjacentText('afterEnd', ';') This post has been edited by S.mutans: Oct 5 2016, 11:50 PM |
pandy |
Oct 6 2016, 07:23 AM
Post
#18
|
🌟Computer says no🌟 Group: WDG Moderators Posts: 20,730 Joined: 9-August 06 Member No.: 6 |
Are any characters currupted in my file? If not, just convert it to UTF-8.
I'm afraid I didn't understand the rest of your question. |
Christian J |
Oct 6 2016, 02:21 PM
Post
#19
|
. Group: WDG Moderators Posts: 9,653 Joined: 10-August 06 Member No.: 7 |
Somebody gave me this java script: CODE [].forEach.call(document.querySelectorAll('.calibre_11 br:first-of-type'), function (e) { e.insertAdjacentText('beforebegin', ';') }) Note that any changes made by javascript are just temporary. Also insertAdjacentText is a new feature, with limited browser support. QUOTE I want to change the dictionary.ePub into AnkiDecks .apkg file. Seems https://en.wikipedia.org/wiki/Anki_(software) does use HTML, but I don't know if it supports javascript too? |
Lo-Fi Version | Time is now: 19th April 2024 - 02:52 PM |