Forum Home
Press F1
 
Thread ID: 113561 2010-10-25 23:14:00 need program or script, that can fix html layout, and bulk update html tags to xhtml Morgenmuffel (187) Press F1
Post ID Timestamp Content User
1147643 2010-10-25 23:14:00 Hi all

Problem 1)

I regularly have to deal with html files that have been generated and may have missing or incorrectly nested tags and often the entire site is generated in one line, What i need is some function or program that will just lay the code out, indented etc, so its readable (but not alter any tags as i need to find the errors to track back).
Currently i use a few simple find and replaces
eg find </tr><tr> replace with </tr>\n<tr> (with use regex ticked)
to get readability, but surely something this simple must be automated somewhere

Problem 2)

I also have a large site that has a huge variety of tags some html some xhtml
I just want to change them all to xhtml
again find and replace works fine for <br> to <br />
but trying to convert image tags and input tags is way more difficult, especially finding a tool that alters bulk pages at once


any suggestions would be appreciated
Morgenmuffel (187)
1147644 2010-10-25 23:36:00 I feel your pain... I have tried a few different methods, including Dreamweavers cleanup functions. Most introduce more problems I have found. I usually end up resorting to doing it by hand and using find/replace. It's the only method that works 100% :p SoniKalien (792)
1147645 2010-10-25 23:39:00 I use to use a online html validation site (can't recall the name) for my sites/blogs, and it would show the necessary corrections required.

I also did tutorials for html/xhtml by html expert Jennifer Kyrin at About Com. Might have a solution, e.g. html to xhtml conversions. (webdesign.about.com)
kahawai chaser (3545)
1147646 2010-10-26 04:39:00 Problem 1)Try Tidy (en.wikipedia.org) - in my experience it does a great job, and can fix almost anything you throw at it.


Problem 2)Again, Tidy is the go-to tool for this job, it'll make light work of the task.

You could also use a regex search / replace tool, although this will take you a fair bit longer than Tidy will.
Erayd (23)
1147647 2010-10-27 04:27:00 Can tidy do a fix as a batch process?

and does it have pretty interface somewhere as last time i checked it was command line and i ended up utterly lost


also, and on a different tack

is there a tool that can scan sections of a live site and flag those pages that have missing tags, specifically div tags, as this site seems to have an abundance of <div> tags but they don't seem to have bothered with </div> tags as much, i guess when the site was table based it wasn't an issue, but now I am trying to move it towards css, it's playing havoc

or alternately i can scan the pages on my pc, which is probably better
Morgenmuffel (187)
1147648 2010-10-27 04:39:00 There's the web developers toolbar (www.snapfiles.com), that I used a while back for live sites, and a add-on for Firefox. Don't know about current version. kahawai chaser (3545)
1147649 2010-10-27 05:42:00 Can tidy do a fix as a batch process?If you script it, yes. If I'm doing batch repairs on a ton of files I usually pipe the output of find into a tidy loop - something like this would batch-convert all html files in or below the current directory into valid xhtml:
find . -type f -name "*.html" | while read f; do
tidy -q -m -e -asxml "$f"
done

Edit: If you only need to run one command on the file (i.e. tidy), the above can be expressed more succinctly as:
find . -type f -name "*.html" -exec tidy -q -m -e -asxml {} \;


...and does it have pretty interface somewhere as last time i checked it was command line and i ended up utterly lostTidy itself it a CLI program, but there are several GUI frontends available for it - a few are listed on Tidy's SF project page (http://tidy.sourceforge.net/).


...is there a tool that can scan sections... and flag those pages that have missing tags, specifically div tags... on my pc, which is probably betterTidy can fix this for you, just point it at the offending files.

If you're happy to have a lot of stuff flagged (not just missing </div> tags), the W3 Markup Validator (http://validator.w3.org/) will do this for you.

I don't know of any tool that will only flag for missing </div> tags, but if you really need this rather than the in-place fix that tidy provides or W3's error flagging, let me know and I'll write one for you.
Erayd (23)
1147650 2010-10-27 07:09:00 can it replace font tags consistently?

What i mean is from testing with a few front ends

it will replace



<font size="1">Hello darkness my old friend</font> <font size="3">Watch out where the Huskies go</font>


with



span . c2 {font-size: 80%}
span . c1 {font-size: 70%}

<span class="c1">Hello darkness my old friend</span> <span class="c2">Watch out where the Huskies go</span>



But on another page that has say



<font size="7">Yellow Matter Custard</font>
<font size="1">Hello darkness my old friend</font> <font size="3">Watch out where the Huskies go</font>

when I process it, it comes up


span . c3 {font-size: 80%}
span . c2 {font-size: 70%}
span . c1 {font-size: larger}

<span class="c1">Yellow Matter Custard</span>
<span class="c2">Hello darkness my old friend</span>
<span class="c3">Watch out where the Huskies go</span>


Now i want to link it to an external css, but if the styles for each page are different, then I am stuffed .

Is there a way for me to tell it that it needs to be consistent across pages?

edit--------------------------
I haven't tried batch processing yet
Morgenmuffel (187)
1147651 2010-10-27 07:26:00 My understanding is that it will name styles in the order they are required.

I'm not sure whether this will work, but have you tried specifying multiple input files for a single run of Tidy? Something like this:
TIDYCMD="tidy -q -m -e -asxml"
find . -type f -name "*.html" | while read f; do
TIDYCMD="$TIDYCMD \"$f\""
done
$TIDYCMD
Erayd (23)
1147652 2010-10-28 02:05:00 I tried batching it, but it seems to work in a per document basis.

So in the end i just did a find and replace on each tag
Morgenmuffel (187)
1