Super User Asked by fando32 on January 26, 2021
I have a list of sentences in a text file and I would like to extract the part between |SENTENCE| and|TRANSLATION| to a list in a separate text file (I placed a bold and as a placeholder for what I want to extract).
Following this, I would like to edit the list and replace the portion between |TRANSLATION|and|END| with my list of edits in sequential order (meaning in the same order I extracted the text originally).
With my basic knowledge, I imagine notepad++ is able to achieve these goals of mine, but if not, I’m willing to try anything else plausible someone suggests.
(I apologize for the poor formatting and terminology. I am not a programmer nor am I very acquainted with how to format or use the best terminology for what I am looking for – I believe these sentence portions may be referred to as "expressions" in these parts – in a way that would be easily intelligible to a programmer/expert. I do believe I can follow instructions, so I would appreciate any help.)
Here is a portion of my list for an example:
|SENTENCE|4.3.2.71.0.1.0|TRANSLATION|4.3.2.71.0.1.0|END|
|SENTENCE|BGMの曲名を表示します|TRANSLATION|Displays the song name of the background music|END|
|SENTENCE|BGMの曲名を表示しません|TRANSLATION|Do not display bgm song names|END|
|SENTENCE|BGMの曲名を表示しませんヘルプを表示しません|TRANSLATION|Don't show bgm song name Don't show help|END|
|SENTENCE|BGM効果音音声動画音声をループして再生する動画を再生する|TRANSLATION|Play a video that loops and plays bgm sound effects audio video audio|END|
|SENTENCE|BGVの音量設定ショートカット設定|TRANSLATION|BGV volume setting shortcut settings|END|
An example of the desired extraction for the first two sentences to a list in a separate file:
4.3.2.71.0.1.0
BGMの曲名を表示します
An example of the edits to the list in a separate file:
4.3.2.71.622.1.0
Displays background music
An example of the desired replacement for the first two sentences in the original file:
|SENTENCE|4.3.2.71.0.1.0|TRANSLATION|4.3.2.71.622.1.0|END|
|SENTENCE|BGMの曲名を表示します|TRANSLATION|Displays background music|END|
Edit:
Anyway, I’m having a problem on the first step, extraction (I have yet to try step 2). The majority of the text is correctly extracted by the regex |SENTENCE|(.*)|TRANSLATION|.*
. However, certain text remains the same, (
Example 4:
|SENTENCE| 0001 0120/12/09
23:33「動かないで。今楽に」|TRANSLATION| 0001 0120/12/09rn 23:33 "Don't move. "|END|
Example 5:
|SENTENCE|。【梢】「気になるの?」あ「俺でよければ」
あ「べ、別に」
|TRANSLATION|。 【Na】"Are you bothered?" Ah, if you don't mind me.nAh, be, separately.n|END|`
)
I didn’t think this would happen, so I omitted posting these unusual sentences in my post examples, but now I realize my naivety. I’m suspecting this is due to the whitespace and the regex isn’t accounting for the presence of it in between lines. I read and skimmed through all of the sections of ryan’s tutorial that you linked to and I found the shorthand character classes. In particular I found s
, which apparently matches with whitespace. I plugged it in to the expression you gave me as |SENTENCE|(.*s)|TRANSLATION|.*
and I also tried |SENTENCE|(.*s*)|TRANSLATION|.*
, both of which matched with 22 sentences (
Example 6:
|SENTENCE|うーん……どうしたら、解ってくれるんだろう?
|TRANSLATION|Well let's see...... How can you solve it?n|END|
)
, but not the first two examples listed in this edit. When I tried |SENTENCE|(/s.*)|TRANSLATION|.
I received 0 matches.
I’m a bit unsure about how to correctly structure the regex so it matches examples 4 and 5 (and ideally 6 and the text matched by the first regex as well) listed in this comment, as such I’d like to ask for additional guidance/clarification on this matter (@NotTheDr01ds or anyone else who sees this and can help).
If I'm understanding your question correctly, you have two source files, let's call them original.txt
(your first example) and myedits.txt
(the third one). You need two sets of desired results, extractions.txt
(the second example), and replacements.txt
(the fourth example).
Let me know if that's still not right.
But if that's correct, then here's what I came up with.
To get extractions.txt
, the feature you are looking for is called "regular expression search and replace" (often shortened to "regex"), and I see that it is supported by Notepad++. It's a feature that's also supported (in one form or another) by hundreds of other programs.
Learning the basics of regex is something that I highly recommend. It just makes so many bulk editing tasks easier.
There are subtle differences between different regex implementations, so I had to look up just how Notepad++ does it. There's a Notepad++ regex documentations FAQ that is very helpful, but only if you already understand the basics. While that FAQ says that "noobs" should start with the Notepad++ doc, ignore that advice -- it's still too "technical" a description. Further down in that FAQ is a link to a tutorial that explains things much more clearly ... https://ryanstutorials.net/regular-expressions-tutorial/.
But back to your question at hand. Open the "Replace" dialog in Notepad++ (Ctrl+H
or using the menu Search
-> Replace
):
Image credit: Dr. Haider M. al-Khateeb's Understanding RegEx with Notepad++.
For the Find what
, try:
|SENTENCE|(.*)|TRANSLATION|.*
In regex-speak, that's "Find any characters between |SENTENCE| and |TRANSLATION| and store them. Everything after that, we don't care about."
The backslash in each |
is because the |
character is "special" in regex, so we have to "escape it" by putting a backslash before it to tell it we really do want to find an actual |
. The .
is "any character" and the *
means "as many as you can find." The parenthesis surrounding the first .*
creates a "capture group", which tell it what you want to "store" for use later in the replacement.
To create your first file, the Replace with
should be simply 1
or $1
(either should work, according to the Notepad++ docs). Again, in regex-speak, this says "Replace the entire line you found with the first set of characters I asked you to store".
Save that resulting text as extractions.txt
, then go back into the original (Undo
might work just as well).
edits.txt
to the corresponding line in original.txt
This gets a bit tricky, but there are at least two ways to pull it off in Notepad++ without having to do any "scripting."
The first method is to use Notepad++ column mode. It's a bit "hacky" here, but it works. To summarize:
In myedits.txt
hold down the Alt
key while selecting the entire file. Since you are in "column mode" (I like to think of it as "block mode"), you'll need to select the entire block of text, meaning you'll extend the selection as far to right as the longest line you have.
Copy that to the clipboard
In original.txt
, go to the end of the first line
Add a bunch of spaces to make sure the first line is the longest line in the file
With the cursor at the end of the first line, paste the "column block", making each line from myedits.txt
appear at the end of the corresponding line from original.txt
.
The column mode "paste" operation, for me left some unexpected whitespace at the end of a few lines. To get rid of that, I did a regex replace of W*$
with nothing (which just deleted the empty space).
Then the regex to swap the edited text with the original translation would be ...
|SENTENCE|(.*)|TRANSLATION|.*|END|W*(.*)W*$
|SENTENCE|$1|TRANSLATION|$2|END|
Regex explanation:
()
group, .*
, |
, and $1
above.W*
matches the blank spaces that column-mode created. It says to grab as much whitespace as it can find and to just ignore it (since we aren't putting it in a capture group). We also throw away a few errant spaces at the end of the lines that column-mode put in.$
at the end of the line means "the end of the line".|
characters in the replacement text. This is because they aren't "special" to regex replacements, just to matching.There are other ways to construct the same regex that are more concise, but probably a bit harder to understand.
myedits.txt
to original.txt
I think I prefer this method better. Recording macros essentially lets you create a "program" without "programming", just using the same set of keystrokes you would use to do the same thing manually.
The only downside (IMHO) is that you need to know how to do every operation using keystrokes, rather than the mouse.
Notepad++ has a great feature that let's you repeat the same set of actions for each line on the file.
To start with, since you're going to be recording and playing this macro back across two files, you'll need to go to Settings
-> Preferences
-> MISC
and turn off the Document Switcher
option that is on by default.
Then, open both original.txt
and myedits.txt
at the same time, so they are each in their own tab. Make sure they are the only two tabs open in Notepad++.
In original.txt
position the cursor at the beginning of the first line (Ctrl+Home
).
Macro
-> Start Recording
(this is the only thing that can be done with the mouse until we are done with the macro. Use the keyboard only from here on out).
End
(this moves the cursor to the end of the first line)
Ctrl+Tab
(switches to myedits.txt
)
Shift+End
(selects the first line)
Ctrl+X
(cuts the line to the clipboard)
Del
(Removes the resulting blank line)
Ctrl+Tab
(switches back to original.txt
)
Ctrl+V
(pastes the cut line at the end of the first line)
Down Arrow
(moves the cursor down to the next line, of course)
Home
(moves the cursor to the begging of the line
Macro
-> Stop Recording
(using the mouse)
Don't move the cursor - Leave it at the start of the second line.
The result is that we've moved your first edit from myedits.txt
to original.txt
and most importantly, we are in exactly the same place on the second line as we started recording on the first. In addition, the first line in myedits.txt
now contains the text for the second replacement sentence).
That means that if you run the macro (Macro
-> Playback
), you'll move the next line of text over onto the second line in original.txt
. Do it again, and you'll have the third.
But Notepad++ makes this even easier. Just use Macro
-> Run a Macro Multiple Times
and select Run until end end of file
to just have it repeat over each line. This moved over all but the last line (I'm not quite sure why), but running it one more time moved the last line over as well.
Ok, from here you might be able to figure out the regex needed to do the final replacement, but since we've come this far .... The same one that I created above (for the column mode) will work, since it says "Remove that extra whitespace before and after (but only if there's whitespace anyway)." But since we don't have any extra whitespace, using this method, you could simplify it as ...
|SENTENCE|(.*)|TRANSLATION|.*|END|(.*)
|SENTENCE|$1|TRANSLATION|$2|END|
(exactly the same as in "Step 2, Method 1")Alright, this turned into a bit of a "book", but I hope (a) this helps you accomplish what you need, (b) I've provided some techniques that will be useful to you in the future.
I know I've learned a bit about Notepad++. I think I would still prefer Unix-like tools for this personally, but it's always nice to have another possible tool in the belt.
Correct answer by NotTheDr01ds on January 26, 2021
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP