Forum Home
Press F1
 
Thread ID: 109215 2010-04-29 01:59:00 Regular Expression help - only preserve contents of Anchor tags Morgenmuffel (187) Press F1
Post ID Timestamp Content User
880861 2010-04-29 01:59:00 Hi all

I am currently trying to get info out of a former frontpage site which is a mess to put it bluntly

Basically all i want out of the pages is the anchor links like below

<a href="/files/sopwith/camel.htm">Biggles and Algie</a>

While finding them should be easy the sheer amount of extraneous tags is making the going painful,

But i can't get my regular expressions working
Morgenmuffel (187)
880862 2010-04-29 02:09:00 This works to find the links, but what i want it to do is remove everything else, and i am blowed if i can figure it out, I also xan't get the below code to work in notepad++, but it works in an elderly version of dreamweaver



<a\b[^>]*>(.*?)</a>
Morgenmuffel (187)
880863 2010-04-29 02:40:00 I take that back the above code is only finding some links and not all as it isn't finding any that have line breaks in them
eg
<a href="/files/sopwith/camel.htm">Biggles and Algie
</a>
dammit my brain is now officially hurting
Morgenmuffel (187)
880864 2010-04-29 03:17:00 Eureka-ish


<a\b[^>]*>([\s\S]+?)</a>


probably not the most elegant, and i still can't work out how to get rid of all the other text on the page, or pipe the result into a new file on windows
Morgenmuffel (187)
1