Perl – Great for text editing (the best!) but opening and closing input and output files got annoying after a while so I switched to JavaScript. Below is my last Perl version (2016.11) which assumes an input file song_input.rtf saved in the same location as the code, and native Windows program wordpad.exe available to open it. You can comment out the system() functions and just manually edit the input file if you don’t have WordPad.
More details at my old LiveJournal site’s resources page.
song_transform.pl (2026.11 version with improved special character handling) requires input file song_input.rtf
#!/pkg/bin/perl -w
# Formatting script for songlations.livejournal.com posts.
# Place the raw song text into song_input.txt
# Run program. Edited text will be in song_output.txt
# Author: Cairaguas Gonzalez <cairaguas@gmail.com>
# Last modified: November 15, 2016
use warnings; #this replaces -w in Perl 5.6.x and above
use Win32::Console::ANSI
# http://stackoverflow.com/questions/700187/unicode-utf-ascii-ansi-format-differences
print "\nSONG_TRANSFORM by Cairaguas: \nIf WordPad.exe is not available, choose NO for all prompts to \nopen files and instead manually open and edit the text files \nin the folder with the Perl script.\n\n";
print "Open song_input.txt now? (y=1/n=0)\n";
$opennow = <>;
######################################################################
# INITIAL SONG INPUT AND DECISION OF TRANSFORMATION TYPE
if ($opennow == 1)
{
system("start wordpad.exe song_input.txt"); # This opens the input file in the foreground using WordPad.
print "Opening... \n";
}
else
{ print "Continuing with existing song_input.txt.\n";
}
print "Input song text, save, and close WordPad.
\nType 1 for simple transformation (just capitalization).\nType 3 for Songlations format transformation.\nType 5 for changing special characters to HTML code.\n";
$continue = <>; #This requests input from user.
#Press ENTER when input is finished.
open (IN, "<:encoding(Windows-1252)", "song_input.txt") or die("Could not open file song_input.txt from Perl folder."); #open file for reading
open (OUT, ">:encoding(Windows-1252)", "song_output.txt") or die("Could not open song_output.txt from Perl folder."); #open file for writing
######################################################################
# SIMPLE TRANSFORMATION
if ($continue == 1)
{
while ($line = <IN>)
{
chomp $line; #removes line break at end of string
if ($line =~ m/^http/)
{print OUT "$line\n";}
# do not capitalize url
# ^ is an anchor character, requires match to be at the beginning
elsif (($line =~ m/^[^a-zA-Z]/) && ($line !~ m/^\<[ibu]\>/))
# first char not letter character, not tag
# could also use \W to match a non-word character
{
$first = substr($line,0,1); # first character
$rest = substr($line,1); # rest of the string
print OUT "$first\u$rest\n";
}
elsif ($line =~ m/^\<[ibu]\>[^a-zA-Z]/)
# if line begins with tag preceding non-letter character
# not currently working for quotation marks and not sure why
{
$first = substr($line,0,4); # grab the tag and next character
$rest = substr($line,4); # grab the rest
print OUT "$first\u$rest\n";
}
elsif ($line =~ m/^\<[ibu]\>[a-zA-Z]/)
# if line begins with italics or bold or underline tag
{
$first = substr($line,0,3); # grab the <i> or <b> or <u> tag
$rest = substr($line,3); # grab the rest
print OUT "$first\u$rest\n";
}
else
{print OUT "\u$line\n";} # \u makes next letter uppercase
}
}
######################################################################
# COMPLEX TRANSFORMATION
elsif ($continue == 3)
{
print OUT ", English translation of lyrics \n\n";
print OUT "<b>\"\"<\/b> \n";
print OUT "Album: <i><\/i> \n";
print OUT "Style: \n";
print OUT "Country: \n\n";
print OUT "<b><u>Listen:<\/u><\/b>\r\n\n\n";
print OUT "<lj-spoiler text=\"Expand embedded video\"><\/lj-spoiler>\n\n";
print OUT "<b><u>Translation:<\/u><\/b> \n\n \n\n";
while ($line = <IN>)
{
if (($prevline !~ m/[a-zA-Z]/) && ($line !~ m/(Chorus|chorus)/) && ($prevline !~ m/^(---)/))
#If the previous line was empty, the current line isn't a chorus line, or following a dash
#This prevents double <i> after a dash line
{ print OUT "<i>";
}
if (($line !~ m/[a-zA-Z]/) && ($prevline =~ m/[a-zA-Z]/)) #If the line was empty.
{ print OUT "</i>\n$line";
#to do: rewrite so it doesn't do this before Chorus line
}
else #the current line isn't a line break
{
if ($prevline =~ m/[a-zA-Z]/)
{print OUT "\n";}
#adds line break as long as it's not after the <i> added above
chomp($line);
if ($line =~ m/^(http|---)/)
{print OUT "\n$line\n";}
# do not capitalize url; does not add italics to dash lines
# ^ is an anchor character, requires match to be at the beginning
elsif ($line =~ m/^(Chorus|chorus)/)
{
print OUT "\n<b>Chorus<\/b>\n<i>";
}
# encloses "chorus" line in bold tags and begins italics tag
# warning: removes anything else on the same line
elsif (($line =~ m/^[^a-zA-Z]/) && ($line !~ m/^\<[ibu]\>/) && ($line !~ m/^(Chorus|chorus)/))
# first char not letter character, not tag
# could also use \W to match a non-word character
{
$first = substr($line,0,1); # first character
$rest = substr($line,1); # rest of the string
print OUT "$first\u$rest";
}
elsif ($line =~ m/^\<[ibu]\>[^a-zA-Z]/)
# if line begins with tag preceding non-letter character
# not currently working for quotation marks and not sure why
{
$first = substr($line,0,4); # grab the tag and next character
$rest = substr($line,4); # grab the rest of the line
print OUT "$first\u$rest";
}
elsif ($line =~ m/^\<[ibu]\>[a-zA-Z]/)
# if line begins with italics or bold or underline tag
{
$first = substr($line,0,3); # grab the <i> or <b> or <u> tag
$rest = substr($line,3); # grab the rest
print OUT "$first\u$rest";
}
else
{print OUT "\u$line";} # \u makes next letter uppercase
}
$prevline = $line; #still inside the while loop
last if eof(IN); #breaks the while loop
} #ends while loop
print OUT "<\/i>\n\n" . "<b><u>Translation Notes:<\/u><\/b>\n\n<\/lj-cut>";
} #ends complex transformation script block
######################################################################
# ACCENT AND SPECIAL CHARACTER TO HTML CODE
elsif ($continue == 5)
{
while ($line = <IN>)
{
$line =~ s/\xE1/\á\;/g;
$line =~ s/\xE9/\é\;/g;
$line =~ s/\xED/\í\;/g;
$line =~ s/\xF3/\ó\;/g;
$line =~ s/\xFA/\ú\;/g;
$line =~ s/\xF1/\ñ\;/g;
$line =~ s/\xA1/\¡\;/g;
#see: https://en.wikipedia.org/wiki/Windows-1252
print OUT $line;
# last if eof(<IN>);
}
}
else {print OUT "else was used 456"};
######################################################################
# FINISHED TRANSFORMATIONS
print "\nTransforming...\nDone. \nTransformed text is located in song_output.txt. \n\n";
close (IN);
close (OUT);
print "Open song_output.txt now? (y=1/n=0).\n";
$opennow = <>;
if ($opennow == 1)
{
system("start wordpad.exe song_output.txt");
# This opens the ouput file in the foreground using WordPad.
}
else
{
print "Goodbye.\n";
}
# http://perldoc.perl.org/perlre.html#Regular-Expressions
# http://www.pjb.com.au/comp/diacritics.html
# http://users.cs.cf.ac.uk/Dave.Marshall/PERL/node79.html
# http://www.w3schools.com/charsets/ref_utf_dingbats.asp
# different encodings: http://htmlpurifier.org/docs/enduser-utf8.html
Pro: This program does not ruin special characters or output them as question marks or boxes. It leaves special characters alone and transfers them unaltered to the output file.
Con: This program does not transform accented capital letters so you need to check the program output manually.
How to use on Windows:
First download and install Perl (free). Once installed, copy the code above into Notepad and save-as filetype “All Files” (to avoid saving with the .txt extension) and name the file song_transform.pl in the folder where you installed Perl (probably C:\Perl or C:\Perl64). To bring up the program, open the Windows command box by using WindowsKey + R and typing cmd into the “Run” window that comes up. Navigate to the folder where you installed Perl by typing something like this into the command line:
cd C:\Perl
…or wherever your Perl folder is located. Hit enter. Now type this into the command line:
perl song_transform.pl
The program will run. It will ask you if you want to open song_input.rtf. Follow the instructions. At the end, the program will ask you if you want to open song_output.rtf. If you don’t want to open the input and output files from the command window now, you can open them later by finding them in your Perl folder. The .rtf file extension is a rich text file, the most open-source text file you can have while still maintaining font, italics, bold, special characters, and other basic formatting that gets lost with the plain .txt text file. If you want to edit other file types (e.g. .docx), just switch the file extension in the script.
How to use on Mac:
Mac OS X already has Perl pre-installed. Skip the installation step and just save the script, then run it. The input and output files will be generated in the same folder you saved the script. However, you might need to manually create the song_input.rtf file and tell the program no when it asks to open it from within the script. The script will still run on the rich text file, but it will skip the part that opens WordPad. I haven’t tested that part on a Mac, so I don’t know if it produces an error. After the script is finished, open the output file song_output.rtf manually as well.
Perl code edit notes:
Modified: November 15, 2016
Cleaned up code formatting and improved the special character reading for Windows 10. If you don’t have wordpad.exe, either replace the program exe link in the system() functions with another program that reads rich text files, or select no at prompts asking to open .rtf files. The code will still run and the output will still be created, but you will need to open files manually.
Modified: July 1, 2012
Now the program can open the necessary input and output Wordpad files in the foreground, directly from the command window. If you don’t want this, comment out (add a hashtag before) the system() functions or select no to the prompt “Open song_input.rtf now? (y=1/n=0)” at the beginning, then select no to the prompt “Open song_output.rtf now? (y=1/n=0)” at the end.
Modified: January 26, 2011
Now transforms the first letter after a special character (such as ¿ or ¡) and after the italics tag.
Published: November 12, 2010
Very simple. Transformed first letter of every line into uppercase except for lines that had a url or special characters in the beginning.