-
Star
(146)
You must be signed in to star a gist -
Fork
(35)
You must be signed in to fork a gist
-
-
Save cosmocatalano/4544576 to your computer and use it in GitHub Desktop.
| <?php | |
| //returns a big old hunk of JSON from a non-private IG account page. | |
| function scrape_insta($username) { | |
| $insta_source = file_get_contents('http://instagram.com/'.$username); | |
| $shards = explode('window._sharedData = ', $insta_source); | |
| $insta_json = explode(';</script>', $shards[1]); | |
| $insta_array = json_decode($insta_json[0], TRUE); | |
| return $insta_array; | |
| } | |
| //Supply a username | |
| $my_account = 'cosmocatalano'; | |
| //Do the deed | |
| $results_array = scrape_insta($my_account); | |
| //An example of where to go from there | |
| $latest_array = $results_array['entry_data']['ProfilePage'][0]['user']['media']['nodes'][0]; | |
| echo 'Latest Photo:<br/>'; | |
| echo '<a href="http://instagram.com/p/'.$latest_array['code'].'"><img src="'.$latest_array['display_src'].'"></a></br>'; | |
| echo 'Likes: '.$latest_array['likes']['count'].' - Comments: '.$latest_array['comments']['count'].'<br/>'; | |
| /* BAH! An Instagram site redesign in June 2015 broke quick retrieval of captions, locations and some other stuff. | |
| echo 'Taken at '.$latest_array['location']['name'].'<br/>'; | |
| //Heck, lets compare it to a useful API, just for kicks. | |
| echo '<img src="http://maps.googleapis.com/maps/api/staticmap?markers=color:red%7Clabel:X%7C'.$latest_array['location']['latitude'].','.$latest_array['location']['longitude'].'&zoom=13&size=300x150&sensor=false">'; | |
| ?> | |
| */ |
looks like instagram blocking scraping using file_get_contents/curl anyone got solution? i wonder how online web scraping tools are working then without block?
Hi 'Cosmocatalano' [ nomen est omen?] :) ,
this is a very interesting solution. I only try it on local host so I have no problem with CORS. But the array names seem to be changed completely. The only one which is still the same seems to be 'entry_data'. Is this changed response still usable with alternative array 'names'? This would be very interesting.
Best regards and thanks
Axel Arnold Bangert
looks like instagram blocking scraping using file_get_contents/curl anyone got solution? i wonder how online web scraping tools are working then without block?
I guess it is just the right amount of good proxies.. I am using https://rapidapi.com/neotank/api/simple-instagram-api to avoid dealing with proxies now because they fail all the time (for Instagram) and get 302 redirect to login..
This GitHub repository is a great resource for ios app urls, but it could be updated for more relevance. By the way, have you explored Insta Pro APK for advanced Instagram features?
updated link
https://gist.github.com/ycaty/23cf1c17e6bb6e353f5823b3392c1e01#file-instagram-user-tag-scraping-2020