| Forum Home | ||||
| Press F1 | ||||
| Thread ID: 109215 | 2010-04-29 01:59:00 | Regular Expression help - only preserve contents of Anchor tags | Morgenmuffel (187) | Press F1 |
| Post ID | Timestamp | Content | User | ||
| 880861 | 2010-04-29 01:59:00 | Hi all I am currently trying to get info out of a former frontpage site which is a mess to put it bluntly Basically all i want out of the pages is the anchor links like below <a href="/files/sopwith/camel.htm">Biggles and Algie</a> While finding them should be easy the sheer amount of extraneous tags is making the going painful, But i can't get my regular expressions working |
Morgenmuffel (187) | ||
| 880862 | 2010-04-29 02:09:00 | This works to find the links, but what i want it to do is remove everything else, and i am blowed if i can figure it out, I also xan't get the below code to work in notepad++, but it works in an elderly version of dreamweaver <a\b[^>]*>(.*?)</a> |
Morgenmuffel (187) | ||
| 880863 | 2010-04-29 02:40:00 | I take that back the above code is only finding some links and not all as it isn't finding any that have line breaks in them eg <a href="/files/sopwith/camel.htm">Biggles and Algie </a> dammit my brain is now officially hurting |
Morgenmuffel (187) | ||
| 880864 | 2010-04-29 03:17:00 | Eureka-ish <a\b[^>]*>([\s\S]+?)</a> probably not the most elegant, and i still can't work out how to get rid of all the other text on the page, or pipe the result into a new file on windows |
Morgenmuffel (187) | ||
| 1 | |||||