Portal Home > Knowledgebase > Articles Database > want to use curl to find out how many times a domain links to my server
Posted by serialboxhpc, 09-02-2008, 06:46 PM Hey... I was hoping to get some assistance using curl or possibly another tool to find out how many times a specific domain links to my server. (ive had to remove "HTTP" from my links as per forum rules so where you see :// it was preceded by http) We are receiving thousands of hits to our main website from some questionable website and we wanted to see exactly where these links are. using google and searching linkurdomain.com did not really produce the desired results. Ive been playing with CURL and trying to come up with the proper switches to get this to work but Ive run into a couple problems. This is what Ive been using so far: wget -rkp -l3 -np -nH -X /cgi-bin ://scanneddomain.com and then Ive been using GREP to find out how many times our link appears on the site but the problem ive run into is that on this website they are using relative paths on the main page so that links are referenced as ../../directory/filename.html so when using CURL i get these errors below .. but ive replaced the actual domain with "scanneddomain.com" --18:33:27-- ://scanneddomain/../bg_themes/1/top_left.gif => `%2E%2E/bg_themes/1/top_left.gif' Reusing existing connection to scanneddomain.com:80. HTTP request sent, awaiting response... 400 Bad Request 18:33:27 ERROR 400: Bad Request. questions are: a) how can i scan the site and remove the ../ and ../../ so it uses the correct base.. Ive tried adding --base= and --force-html but that did not seem to work. any help would be appreciated. b) is this really the best way to do this?
Posted by epcmedia, 09-03-2008, 09:33 AM rather than using wget have you looked at your logs under referring URL? That should give you a map of which links are sending people to your site. Also, there is a free windows link checking program called xenu (I think). You should be able to spider a side and filter out your domain's URLS. For what purpose are you doing this? Knowing that we could help you find a solution. Hope that helps, EPCMEDIA
Add to Favourites Print this Article