Stack Overflow Asked by jaklh on December 16, 2021
I’m writing a script that opens firefox with the first duckduckgo result it finds for a given term.
I know. Its very useful.
But when copying a url from my browser and requesting it with python:
url = "https://duckduckgo.com/?t=ffab&q=python+request+duckduckgo&ia=software"
req = r.get(url)
Duckduckgo returns a 418.
What is happening?
Does duckduckgo recognize that I’m doing automated requests and decides to turn into a teapot?
And if so, how can I avoid it?
Also I know there’s a duckduckgo api for python but I’m doing this project to get started with requests
and beautifulsoup
.
I would have added this on to my existing answer by I would have exceeded the maximum character length. In a a way it is a second answer related to my first answer wherein I stated that the HTML returned by requests
may not reflect what you visibly see in a browser because of JavaScript code that is loaded and executed, which modified the page contents. In that case you need to resort to using a tool such as selenium
:
from selenium import webdriver
import time
driver = webdriver.Chrome()
driver.get("https://duckduckgo.com/?t=ffab&q=python+request+duckduckgo&ia=software")
time.sleep(3) # give page a chance to fully load
print(driver.page_source)
driver.quit()
And the output:
<html lang="en_US" class="has-zcm js no-touch opacity csstransforms3d csstransitions svg cssfilters is-not-mobile-device full-urls has-footer"><head><meta http-equiv="content-type" content="text/html; charset=utf-8"><title>python request duckduckgo at DuckDuckGo</title><link rel="stylesheet" href="/s1909.css" type="text/css"><link rel="stylesheet" href="/r1909.css" type="text/css"><meta name="robots" content="noindex,nofollow"><meta name="referrer" content="origin"><meta name="apple-mobile-web-app-title" content="python request duckduckgo"><link rel="preconnect" href="https://links.duckduckgo.com"><link rel="shortcut icon" href="/favicon.ico" type="image/x-icon"><link id="icon60" rel="apple-touch-icon" href="/assets/icons/meta/DDG-iOS-icon_60x60.png?v=2"><link id="icon76" rel="apple-touch-icon" sizes="76x76" href="/assets/icons/meta/DDG-iOS-icon_76x76.png?v=2"><link id="icon120" rel="apple-touch-icon" sizes="120x120" href="/assets/icons/meta/DDG-iOS-icon_120x120.png?v=2"><link id="icon152" rel="apple-touch-icon" sizes="152x152" href="/assets/icons/meta/DDG-iOS-icon_152x152.png?v=2"><link rel="image_src" href="/assets/icons/meta/DDG-icon_256x256.png"><script type="text/javascript" src="s2475.js"></script><script type="text/javascript" async="" src="/a.js?p=1&q=python%20request%20duckduckgo&from=nlp_longtail"></script><script type="text/javascript" async="" src="/d.js?q=python%20request%20duckduckgo&l=us-en&s=0&a=ffab&dl=en&ct=US&ss_mkt=us&vqd=3-149609696422854606330346289888770817762-334759569224655846812843926827516956510&p_ent=&ex=-1&sp=1&v7exp=a&sltexp=b&wiadrk=b"></script><script type="text/javascript" async="" src="/t.js?q=python%20request%20duckduckgo&l=us-en&s=0&dl=en&ct=US&ss_mkt=us&p_ent=&ex=-1&v7exp=a&sltexp=b&wiadrk=b"></script><script type="text/javascript">var ct,fd,fq,it,iqa,iqm,iqs,iqp,iqq,qw,dl,ra,rv,rad,r1hc,r1c,r2c,r3c,rfq,rq,rds,rs,rt,rl,y,y1,ti,tig,iqd,locale,settings_js_version='s2475.js',is_twitter='',rpl=1;fq=0;fd=1;it=0;iqa=0;iqbi=0;iqm=0;iqs=0;iqp=0;iqq=0;qw=3;dl='en';ct='US';iqd=0;r1hc=0;r1c=0;r3c=0;rq='python%20request%20duckduckgo';rqd="python request duckduckgo";rfq=0;rt='';ra='ffab';rv='';rad='';rds=30;rs=0;spice_version='2000';spice_paths='{}';locale='en_US';settings_url_params={};rl='us-en';rlo=0;df='';ds='';sfq='';iar='';vqd='3-149609696422854606330346289888770817762-334759569224655846812843926827516956510';safe_ddg=0;show_covid=0;</script><meta name="viewport" content="width=device-width, initial-scale=1"><meta name="HandheldFriendly" content="true"><meta name="apple-mobile-web-app-capable" content="no"><link title="DuckDuckGo" type="application/opensearchdescription+xml" rel="search" href="https://duckduckgo.com/opensearch.xml?atb=v231-2__"></head><body class="body--serp"><input id="state_hidden" name="state_hidden" type="text" size="1"><span class="hide">Ignore this box please.</span><div id="spacing_hidden_wrapper"><div id="spacing_hidden"></div></div><script type="text/javascript" src="/lib/l118.js"></script><script type="text/javascript" src="/locale/en_US/duckduckgo14.js"></script><script type="text/javascript" src="/util/u469.js"></script><script type="text/javascript" src="/d2827.js"></script><div class="site-wrapper js-site-wrapper" style="min-height: 928px;"><div class="welcome-wrap js-welcome-wrap"></div><div id="header_wrapper" class="header-wrap js-header-wrap"><div id="header" class="header cw"><div class="header__search-wrap"><a tabindex="-1" href="/?t=ffab" class="header__logo-wrap js-header-logo"><span class="header__logo js-logo-ddg">DuckDuckGo</span></a><div class="header__content header__search"><form id="search_form" class="search--adv search--header js-search-form has-text" name="x" action="/" method="GET"><input type="text" name="q" tabindex="1" autocomplete="off" id="search_form_input" class="search__input--adv js-search-input" value="python request duckduckgo" autocapitalize="off" autocorrect="off"><input id="search_form_input_clear" class="search__clear js-search-clear" type="button" tabindex="3" value="X"><input id="search_button" class="search__button js-search-button" type="submit" tabindex="2" value="S"><a id="search_dropdown" class="search__dropdown" href="javascript:;" tabindex="4"></a><div id="search_elements_hidden" class="search__hidden js-search-hidden"><input type="hidden" class="js-search-hidden-field" name="t" value="ffab"></div><div class="search__autocomplete"><div class="acp-wrap js-acp-wrap"></div><div class="acp-footer is-hidden js-acp-footer"><span class="acp-footer__instructions">Shortcuts to other sites to search off DuckDuckGo</span><span class="acp-footer__link"><a class="no-visited js-acp-footer-link" href="/bang">Learn More</a></span></div></div></form></div></div><div id="duckbar" class="zcm-wrap zcm-wrap--header is-noscript-hidden"><div class="zcm"><ul class="zcm__menu zcm__constant has-zci" id="duckbar_static"><li class="zcm__item"><a data-zci-link="web" class="zcm__link js-zci-link js-zci-link--web " href="#">All</a></li><li class="zcm__item"><a data-zci-link="images" class="zcm__link js-zci-link js-zci-link--images " href="#">Images</a></li><li class="zcm__item"><a data-zci-link="videos" class="zcm__link js-zci-link js-zci-link--videos " href="#">Videos</a></li><li class="zcm__item"><a data-zci-link="news" class="zcm__link js-zci-link js-zci-link--news " href="#">News</a></li><li class="zcm__item"><a data-zci-link="maps_expanded" class="zcm__link js-zci-link js-zci-link--maps_expanded " href="#">Maps</a></li></ul><ul class="zcm__menu zcm__dynamic" id="duckbar_new"><span id="duckbar_dynamic_sep" class="zcm__sep--h sep--before is-hidden"></span></ul><ul class="zcm__menu zcm__dropdowns js-duckbar-dropdowns" id="duckbar_dropdowns"><span class="zcm__sep--h sep--before is-hidden js-duckbar-dropdowns-separator"></span><li class="zcm__item"><div class="dropdown dropdown--settings"><a class="zcm__link dropdown__button js-dropdown-button">Settings</a></div></li></ul></div></div></div><div class="header--aside js-header-aside"><a class="header__button--menu js-side-menu-open" href="#">⇶</a><div class="header--aside__item showcase header__label"><span class="header__clickable js-hl-button" data-type="showcase"><span class="js-popout-trig" aria-haspopup="true" aria-label="Check out the list of things that we've also made." role="button" aria-pressed="false"><span id="wedonttrack">Privacy, simplified.</span></span><span class="popout-trig js-popout"><span class="js-popout-link js-showcase-popout ddgsi ddgsi-down" aria-hidden="true" data-type="showcase"></span><div class="modal modal--popout modal--popout--bottom-left modal--popout--sm js-popout-main" data-type="showcase"><div class="modal__box"><div class="modal__body"><nav aria-labelledby="wedonttrack"><section class="showcase__dropdown-top"><ul aria-label="Here are some things that we made that you might like."><li class="fix showcase__dropdown__list"><a href="/app" class="eighteen js-hl-item" aria-hidden="true" data-type="showcase" data-id="app"><div class="woman-icon"></div></a><a href="/app" class="text-left showcase__link eighty js-hl-item" data-type="showcase" data-id="app"><h1 class="showcase__heading">Get Our App & Extension</h1><p class="showcase__subheading">Protect your data on every device.</p></a></li><li class="fix showcase__dropdown__list"><a href="/newsletter" class="eighteen js-hl-item" aria-hidden="true" data-type="showcase" data-id="newsletter"><div class="mailbox-icon"></div></a><a href="/newsletter" class="text-left showcase__link eighty js-hl-item" data-type="showcase" data-id="newsletter"><h1 class="showcase__heading">Privacy in Your Inbox</h1><p class="showcase__subheading">Stay protected and informed with our privacy newsletters.</p></a></li><li class="fix showcase__dropdown__list"><a href="https://spreadprivacy.com/tag/device-privacy-tips/" class="eighteen js-hl-item" aria-hidden="true" data-type="showcase" data-id="blog"><div class="privacy-simplified-icon"></div></a><a href="https://spreadprivacy.com/tag/device-privacy-tips/" class="text-left showcase__link eighty js-hl-item" data-type="showcase" data-id="blog"><h1 class="showcase__heading">Protect Your Devices</h1><p class="showcase__subheading">Check out our privacy device guides.</p></a></li><li class="fix showcase__dropdown__list"><a href="https://duckduckgo.com/spread" class="eighteen js-hl-item" aria-hidden="true" data-type="showcase" data-id="spread"><div class="spread-icon"></div></a><a href="https://duckduckgo.com/spread" class="text-left showcase__link eighty js-hl-item" data-type="showcase" data-id="spread"><h1 class="showcase__heading">Spread DuckDuckGo</h1><p class="showcase__subheading">Help your friends and family join the Duck Side!</p></a></li></ul></section><section class="showcase__dropdown-bottom"><ul class="text-left" aria-label="We've got even more things for you."><li class="fix showcase__dropdown__list"><a href="https://duckduckgo.com/donations" class="eighteen showcase__icon js-hl-item" aria-hidden="true" data-type="showcase" data-id="donations"><div class="donations-icon"></div></a><a href="https://duckduckgo.com/donations" class="text-left showcase__link eighty showcase__text js-hl-item" data-type="showcase" data-id="donations">$1,900,000 in privacy donations!</a></li><li class="fix showcase__dropdown__list"><a href="https://duckduckgo.com/traffic" class="eighteen showcase__icon js-hl-item" aria-hidden="true" data-type="showcase" data-id="traffic"><div class="traffic-icon"></div></a><a href="https://duckduckgo.com/traffic" class="text-left showcase__link eighty showcase__text js-hl-item" data-type="showcase" data-id="traffic">Over 51 Billion anonymous searches.</a></li><li class="fix showcase__dropdown__list"><a href="https://donttrack.us/" class="eighteen showcase__icon js-hl-item" aria-hidden="true" data-type="showcase" data-id="dnt"><div class="privacy-tips-icon"></div></a><a href="https://donttrack.us/" class="text-left showcase__link eighty showcase__text js-hl-item" data-type="showcase" data-id="dnt">Learn why reducing tracking is important.</a></li></ul></section></nav></div></div></div></span></span></div><div class="header--aside__item header--aside__social header__label social"><span class="header__clickable js-hl-button" data-type="social"><span class="js-popout-trig header--aside__social-icon " aria-haspopup="true" aria-label="Keep in touch" role="button" aria-pressed="false"><span class="ddgsi ddgsi-horn" data-type="social"></span></span><span class="popout-trig js-popout"><span class="js-popout-link ddgsi ddgsi-down" aria-hidden="true" data-type="social"></span><div class="modal modal--popout modal--popout--bottom-left modal--popout--sm js-popout-main" data-type="social"><div class="modal__box"><div class="modal__body"><div class="social__link"><a href="https://twitter.com/duckduckgo" class="js-hl-item social__link__text" data-type="social" data-id="twitter"><img class="social__icon js-lazysvg" data-src="/assets/icons/header/twitter.svg"><span>Twitter</span></a></div><div class="social__link"><a href="https://reddit.com/r/duckduckgo" class="js-hl-item social__link__text" data-type="social" data-id="reddit"><img class="social__icon js-lazysvg" data-src="/assets/icons/header/reddit.svg"><span>Reddit</span></a></div><div class="social__link"><a href="https://spreadprivacy.com" class="js-hl-item social__link__text" data-type="social" data-id="blog"><img class="social__icon js-lazysvg" data-src="/assets/icons/header/blog.svg"><span>Blog</span></a></div><div class="social__link"><a href="https://duckduckgo.com/newsletter" class="js-hl-item social__link__text" data-type="social" data-id="newsletter"><img class="social__icon js-lazysvg" data-src="/assets/icons/header/newsletter.svg"><span>Newsletter</span></a></div></div></div></div></span></span></div></div></div><div id="zero_click_wrapper" class="zci-wrap"></div><div id="vertical_wrapper" class="verticals"></div><div id="web_content_wrapper" class="content-wrap "><div class="serp__top-right js-serp-top-right"></div><div class="serp__bottom-right js-serp-bottom-right"><div class="js-feedback-btn-wrap"><div class="btn feedback-btn"><a href="#" class="feedback-btn__send js-feedback-start">Send feedback</a><div class="feedback-btn__icon-wrap is-hidden js-feedback-icon-wrap"><a href="#" class="feedback-btn__icon ddgsi feedback-btn__icon--love js-feedback-love"></a><a href="#" class="feedback-btn__icon ddgsi feedback-btn__icon--nolove js-feedback-nolove"></a></div></div></div></div><div class="cw"><div id="links_wrapper" class="serp__results js-serp-results"><div class="results--main"><div class="search-filters-wrap"><div class="js-search-filters search-filters"><div class="dropdown dropdown--region "><a class="dropdown__button dropdown__button js-dropdown-button">All Regions</a></div><div class="dropdown dropdown--safe-search "><a href="#" class="dropdown__button js-dropdown-button">Safe Search: Moderate</a></div><div class="dropdown dropdown--date "><a href="#" class="dropdown__button js-dropdown-button">Any Time</a></div></div></div><noscript><meta http-equiv="refresh" content="0;URL=/html?q=python%20request%20duckduckgo"><link href="/css/noscript.css" rel="stylesheet" type="text/css"><div class="msg msg--noscript"><p class="msg-title--noscript">You are being redirected to the non-JavaScript site.</p>Click <a href="/html/?q=python%20request%20duckduckgo">here</a> if it doesn't happen automatically.</div></noscript><div id="message" class="results--message"></div><div class="ia-modules js-ia-modules"></div><div id="ads" class="results--ads results--ads--main js-results-ads"></div><div id="links" class="results js-results"><div id="r1-0" class="result results_links_deep highlight_d result--url-above-snippet" data-domain="pypi.org" data-hostname="pypi.org" data-nir="1"><div class="result__body links_main links_deep"><h2 class="result__title"><a class="result__a" rel="noopener" href="https://pypi.org/project/DuckDuckGo-Python3-Library/"><b>DuckDuckGo</b>-Python3-Library · PyPI</a><a rel="noopener" class="result__check" href="https://pypi.org/project/DuckDuckGo-Python3-Library/"><span class="result__check__tt">Your browser indicates if you've visited this link</span></a></h2><div class="result__extras js-result-extras"><div class="result__extras__url"><span class="result__icon "><a href="/?q=python%20request%20duckduckgo+site:pypi.org&t=ffab" title="Search domain pypi.org/project/DuckDuckGo-Python3-Library/" class="js-result-extras-site_search"><img data-src="//external-content.duckduckgo.com/ip3/pypi.org.ico" height="16" width="16" title="Search domain pypi.org/project/DuckDuckGo-Python3-Library/" class="result__icon__img js-lazyload-icons" src="//external-content.duckduckgo.com/ip3/pypi.org.ico"></a></span><a href="https://pypi.org/project/DuckDuckGo-Python3-Library/" rel="noopener" class="result__url js-result-extras-url"><span class="result__url__domain">https://pypi.org</span><span class="result__url__full">/project/DuckDuckGo-Python3-Library/</span></a></div></div><div class="result__snippet js-result-snippet">Files for <b>DuckDuckGo</b>-Python3-Library, version 1.0; Filename, size File type <b>Python</b> version Upload date Hashes; Filename, size <b>DuckDuckGo</b> Python3 Library-1..tar.gz (2.0 kB) File type Source <b>Python</b> version None Upload date Dec 22, 2016 Hashes View</div></div></div><div id="organic-module"></div><div id="r1-1" class="result results_links_deep highlight_d result--url-above-snippet" data-domain="stackoverflow.com" data-hostname="stackoverflow.com" data-nir="1"><div class="result__body links_main links_deep"><h2 class="result__title"><a class="result__a" rel="noopener" href="https://stackoverflow.com/questions/63058873/duckduckgo-returns-418-when-requesting-with-python"><b>Duckduckgo</b> returns 418 when requesting with <b>Python</b> - Stack ...</a><a rel="noopener" class="result__check" href="https://stackoverflow.com/questions/63058873/duckduckgo-returns-418-when-requesting-with-python"><span class="result__check__tt">Your browser indicates if you've visited this link</span></a></h2><div class="result__extras js-result-extras"><div class="result__extras__url"><span class="result__icon "><a href="/?q=python%20request%20duckduckgo+site:stackoverflow.com&t=ffab" title="Search domain stackoverflow.com/questions/63058873/duckduckgo-returns-418-when-requesting-with-python" class="js-result-extras-site_search"><img data-src="//external-content.duckduckgo.com/ip3/stackoverflow.com.ico" height="16" width="16" title="Search domain stackoverflow.com/questions/63058873/duckduckgo-returns-418-when-requesting-with-python" class="result__icon__img js-lazyload-icons" src="//external-content.duckduckgo.com/ip3/stackoverflow.com.ico"></a></span><a href="https://stackoverflow.com/questions/63058873/duckduckgo-returns-418-when-requesting-with-python" rel="noopener" class="result__url js-result-extras-url"><span class="result__url__domain">https://stackoverflow.com</span><span class="result__url__full">/questions/63058873/duckduckgo-returns-418-when-requesting-with-python</span></a></div></div><div class="result__snippet js-result-snippet">I'm writing a script that opens firefox with the first <b>duckduckgo</b> result it finds for a given term. I know. Its very useful. But when copying a url from my browser and requesting it with <b>python</b>: ur...</div></div></div><div id="r1-2" class="result results_links_deep highlight_d result--url-above-snippet" data-domain="github.com" data-hostname="github.com" data-nir="1"><div class="result__body links_main links_deep"><h2 class="result__title"><a class="result__a" rel="noopener" href="https://github.com/crazedpsyc/python-duckduckgo/">GitHub - crazedpsyc/<b>python</b>-<b>duckduckgo</b>: A library for ...</a><a rel="noopener" class="result__check" href="https://github.com/crazedpsyc/python-duckduckgo/"><span class="result__check__tt">Your browser indicates if you've visited this link</span></a></h2><div class="result__extras js-result-extras"><div class="result__extras__url"><span class="result__icon "><a href="/?q=python%20request%20duckduckgo+site:github.com&t=ffab" title="Search domain github.com/crazedpsyc/python-duckduckgo/" class="js-result-extras-site_search"><img data-src="//external-content.duckduckgo.com/ip3/github.com.ico" height="16" width="16" title="Search domain github.com/crazedpsyc/python-duckduckgo/" class="result__icon__img js-lazyload-icons" src="//external-content.duckduckgo.com/ip3/github.com.ico"></a></span><a href="https://github.com/crazedpsyc/python-duckduckgo/" rel="noopener" class="result__url js-result-extras-url"><span class="result__url__domain">https://github.com</span><span class="result__url__full">/crazedpsyc/python-duckduckgo/</span></a></div></div><div class="result__snippet js-result-snippet">A library for querying the <b>DuckDuckGo</b> API. Contribute to crazedpsyc/<b>python</b>-<b>duckduckgo</b> development by creating an account on GitHub.</div></div></div><div id="r1-3" class="result results_links_deep highlight_d result--url-above-snippet" data-domain="github.com" data-hostname="github.com" data-nir="1"><div class="result__body links_main links_deep"><h2 class="result__title"><a class="result__a" rel="noopener" href="https://github.com/mikejs/python-duckduckgo">GitHub - mikejs/<b>python</b>-<b>duckduckgo</b>: A library for querying ...</a><a rel="noopener" class="result__check" href="https://github.com/mikejs/python-duckduckgo"><span class="result__check__tt">Your browser indicates if you've visited this link</span></a></h2><div class="result__extras js-result-extras"><div class="result__extras__url"><span class="result__icon "><a href="/?q=python%20request%20duckduckgo+site:github.com&t=ffab" title="Search domain github.com/mikejs/python-duckduckgo" class="js-result-extras-site_search"><img data-src="//external-content.duckduckgo.com/ip3/github.com.ico" height="16" width="16" title="Search domain github.com/mikejs/python-duckduckgo" class="result__icon__img js-lazyload-icons" src="//external-content.duckduckgo.com/ip3/github.com.ico"></a></span><a href="https://github.com/mikejs/python-duckduckgo" rel="noopener" class="result__url js-result-extras-url"><span class="result__url__domain">https://github.com</span><span class="result__url__full">/mikejs/python-duckduckgo</span></a></div></div><div class="result__snippet js-result-snippet">A library for querying the Duck Duck Go API. Contribute to mikejs/<b>python</b>-<b>duckduckgo</b> development by creating an account on GitHub.</div></div></div><div id="r1-4" class="result results_links_deep highlight_d result--url-above-snippet" data-domain="pypi.org" data-hostname="pypi.org" data-nir="1"><div class="result__body links_main links_deep"><h2 class="result__title"><a class="result__a" rel="noopener" href="https://pypi.org/project/requests-custom/"><b>requests</b>-custom · PyPI</a><a rel="noopener" class="result__check" href="https://pypi.org/project/requests-custom/"><span class="result__check__tt">Your browser indicates if you've visited this link</span></a></h2><div class="result__extras js-result-extras"><div class="result__extras__url"><span class="result__icon "><a href="/?q=python%20request%20duckduckgo+site:pypi.org&t=ffab" title="Search domain pypi.org/project/requests-custom/" class="js-result-extras-site_search"><img data-src="//external-content.duckduckgo.com/ip3/pypi.org.ico" height="16" width="16" title="Search domain pypi.org/project/requests-custom/" class="result__icon__img js-lazyload-icons" src="//external-content.duckduckgo.com/ip3/pypi.org.ico"></a></span><a href="https://pypi.org/project/requests-custom/" rel="noopener" class="result__url js-result-extras-url"><span class="result__url__domain">https://pypi.org</span><span class="result__url__full">/project/requests-custom/</span></a></div></div><div class="result__snippet js-result-snippet"><b>Python's</b> <b>requests</b> with custom configuration. Package to work with custom <b>requests</b> capabilities. Current capabilities available:</div></div></div><div id="r1-5" class="result results_links_deep highlight_d result--url-above-snippet" data-domain="www.blog.pythonlibrary.org" data-hostname="www.blog.pythonlibrary.org" data-nir="1"><div class="result__body links_main links_deep"><h2 class="result__title"><a class="result__a" rel="noopener" href="https://www.blog.pythonlibrary.org/2012/06/08/python-101-how-to-submit-a-web-form/"><b>Python</b> 101: How to submit a web form - The Mouse Vs. The ...</a><a rel="noopener" class="result__check" href="https://www.blog.pythonlibrary.org/2012/06/08/python-101-how-to-submit-a-web-form/"><span class="result__check__tt">Your browser indicates if you've visited this link</span></a></h2><div class="result__extras js-result-extras"><div class="result__extras__url"><span class="result__icon "><a href="/?q=python%20request%20duckduckgo+site:www.blog.pythonlibrary.org&t=ffab" title="Search domain www.blog.pythonlibrary.org/2012/06/08/python-101-how-to-submit-a-web-form/" class="js-result-extras-site_search"><img data-src="//external-content.duckduckgo.com/ip3/www.blog.pythonlibrary.org.ico" height="16" width="16" title="Search domain www.blog.pythonlibrary.org/2012/06/08/python-101-how-to-submit-a-web-form/" class="result__icon__img js-lazyload-icons" src="//external-content.duckduckgo.com/ip3/www.blog.pythonlibrary.org.ico"></a></span><a href="https://www.blog.pythonlibrary.org/2012/06/08/python-101-how-to-submit-a-web-form/" rel="noopener" class="result__url js-result-extras-url"><span class="result__url__domain">https://www.blog.pythonlibrary.org</span><span class="result__url__full">/2012/06/08/python-101-how-to-submit-a-web-form/</span></a></div></div><div class="result__snippet js-result-snippet">Today we'll spend some time looking at three different ways to make <b>Python</b> submit a web form. In this case, we will be doing a web search with <b>duckduckgo</b>.com searching on the term "<b>python</b>" and saving the result as an HTML file. We will use <b>Python's</b> included urllib modules and two 3rd party packages: <b>requests</b> and mechanize.We have three small scripts to cover, so let's get cracking!</div></div></div><div id="r1-6" class="result results_links_deep highlight_d result--url-above-snippet" data-domain="pypi.org" data-hostname="pypi.org" data-nir="1"><div class="result__body links_main links_deep"><h2 class="result__title"><a class="result__a" rel="noopener" href="https://pypi.org/project/duckduckpy/">duckduckpy · PyPI - The <b>Python</b> Package Index</a><a rel="noopener" class="result__check" href="https://pypi.org/project/duckduckpy/"><span class="result__check__tt">Your browser indicates if you've visited this link</span></a></h2><div class="result__extras js-result-extras"><div class="result__extras__url"><span class="result__icon "><a href="/?q=python%20request%20duckduckgo+site:pypi.org&t=ffab" title="Search domain pypi.org/project/duckduckpy/" class="js-result-extras-site_search"><img data-src="//external-content.duckduckgo.com/ip3/pypi.org.ico" height="16" width="16" title="Search domain pypi.org/project/duckduckpy/" class="result__icon__img js-lazyload-icons"></a></span><a href="https://pypi.org/project/duckduckpy/" rel="noopener" class="result__url js-result-extras-url"><span class="result__url__domain">https://pypi.org</span><span class="result__url__full">/project/duckduckpy/</span></a></div></div><div class="result__snippet js-result-snippet">Features. Uses standard library only; Works on <b>Python</b> 2.6+ and 3.3+ Unit test coverage 100%; SSL and unicode support; Licensed under MIT license</div></div></div><div id="r1-7" class="result results_links_deep highlight_d result--url-above-snippet" data-domain="pypi.org" data-hostname="pypi.org" data-nir="1"><div class="result__body links_main links_deep"><h2 class="result__title"><a class="result__a" rel="noopener" href="https://pypi.org/project/duckduckgo2/">duckduckgo2 · PyPI - The <b>Python</b> Package Index</a><a rel="noopener" class="result__check" href="https://pypi.org/project/duckduckgo2/"><span class="result__check__tt">Your browser indicates if you've visited this link</span></a></h2><div class="result__extras js-result-extras"><div class="result__extras__url"><span class="result__icon "><a href="/?q=python%20request%20duckduckgo+site:pypi.org&t=ffab" title="Search domain pypi.org/project/duckduckgo2/" class="js-result-extras-site_search"><img data-src="//external-content.duckduckgo.com/ip3/pypi.org.ico" height="16" width="16" title="Search domain pypi.org/project/duckduckgo2/" class="result__icon__img js-lazyload-icons"></a></span><a href="https://pypi.org/project/duckduckgo2/" rel="noopener" class="result__url js-result-extras-url"><span class="result__url__domain">https://pypi.org</span><span class="result__url__full">/project/duckduckgo2/</span></a></div></div><div class="result__snippet js-result-snippet">Hashes for duckduckgo2-.242.tar.gz; Algorithm Hash digest; SHA256: 46ad296a518f183ae62d2d15459bcc69ca44a7614556f27f59dfc0db6599136a: Copy MD5</div></div></div><div id="r1-8" class="result results_links_deep highlight_d result--url-above-snippet" data-domain="duckduckgo.com" data-hostname="duckduckgo.com" data-nir="1"><div class="result__body links_main links_deep"><h2 class="result__title"><a class="result__a" rel="noopener" href="http://duckduckgo.com/"><b>DuckDuckGo</b></a><a rel="noopener" class="result__check" href="http://duckduckgo.com/"><span class="result__check__tt">Your browser indicates if you've visited this link</span></a></h2><div class="result__extras js-result-extras"><div class="result__extras__url"><span class="result__icon "><a href="/?q=python%20request%20duckduckgo+site:duckduckgo.com&t=ffab" title="Search domain duckduckgo.com" class="js-result-extras-site_search"><img data-src="//external-content.duckduckgo.com/ip3/duckduckgo.com.ico" height="16" width="16" title="Search domain duckduckgo.com" class="result__icon__img js-lazyload-icons"></a></span><a href="http://duckduckgo.com/" rel="noopener" class="result__url js-result-extras-url"><span class="result__url__domain">duckduckgo.com</span><span class="result__url__full"></span></a></div></div><div class="result__snippet js-result-snippet">The Internet privacy company that empowers you to seamlessly take control of your personal information online, etc. (too long to post all of it)
You could, of course, use driver.page_source
as input to Beautiful Soup.
Answered by Booboo on December 16, 2021
You need to add a 'user-agent' header, even one as simple as:
req = r.get(url, headers={'user-agent': 'my-app/0.0.1'})
Update: complete code with reasonably named variables
import requests
url = "https://duckduckgo.com/?t=ffab&q=python+request+duckduckgo&ia=software"
response = requests.get(url, headers={'user-agent': 'my-app/0.0.1'})
response.raise_for_status() # throw an exception if not a 200 return code
# or test response.status_code if you do not want to throw an exception
data = response.text # this is the HTML assuming that is what the URL returns
print(data)
Prints:
<!DOCTYPE html><html lang="en_US" class="no-js has-zcm no-theme "><head><meta http-equiv="content-type" content="text/html; charset=utf-8"><title>python request duckduckgo at DuckDuckGo</title><link rel="stylesheet" href="/s1909.css" type="text/css"><link rel="stylesheet" href="/r1909.css" type="text/css"><meta name="robots" content="noindex,nofollow"><meta name="referrer" content="origin"><meta name="apple-mobile-web-app-title" content="python request duckduckgo"><link rel="preconnect" href="https://links.duckduckgo.com"><link rel="shortcut icon" href="/favicon.ico" type="image/x-icon" /><link id="icon60" rel="apple-touch-icon" href="/assets/icons/meta/DDG-iOS-icon_60x60.png?v=2"/><link id="icon76" rel="apple-touch-icon" sizes="76x76" href="/assets/icons/meta/DDG-iOS-icon_76x76.png?v=2"/><link id="icon120" rel="apple-touch-icon" sizes="120x120" href="/assets/icons/meta/DDG-iOS-icon_120x120.png?v=2"/><link id="icon152" rel="apple-touch-icon" sizes="152x152" href="/assets/icons/meta/DDG-iOS-icon_152x152.png?v=2"/><link rel="image_src" href="/assets/icons/meta/DDG-icon_256x256.png"/><script type="text/javascript">var ct,fd,fq,it,iqa,iqm,iqs,iqp,iqq,qw,dl,ra,rv,rad,r1hc,r1c,r2c,r3c,rfq,rq,rds,rs,rt,rl,y,y1,ti,tig,iqd,locale,settings_js_version='s2475.js',is_twitter='',rpl=1;fq=0;fd=1;it=0;iqa=0;iqbi=0;iqm=0;iqs=0;iqp=0;iqq=0;qw=3;dl='en';ct='US';iqd=0;r1hc=0;r1c=0;r3c=0;rq='python%20request%20duckduckgo';rqd="python request duckduckgo";rfq=0;rt='';ra='ffab';rv='';rad='';rds=30;rs=0;spice_version='2000';spice_paths='{}';locale='en_US';settings_url_params={};rl='us-en';rlo=0;df='';ds='';sfq='';iar='';vqd='3-149609696422854606330346289888770817762-151254838983446808561626137548835915940';safe_ddg=0;show_covid=0;</script><meta name="viewport" content="width=device-width, initial-scale=1" /><meta name="HandheldFriendly" content="true" /><meta name="apple-mobile-web-app-capable" content="no" /></head><body class="body--serp"><input id="state_hidden" name="state_hidden" type="text" size="1"><span class="hide">Ignore this box please.</span><div id="spacing_hidden_wrapper"><div id="spacing_hidden"></div></div><script type="text/javascript" src="/lib/l118.js"></script><script type="text/javascript" src="/locale/en_US/duckduckgo14.js"></script><script type="text/javascript" src="/util/u469.js"></script><script type="text/javascript" src="/d2827.js"></script><div class="site-wrapper js-site-wrapper"><div class="welcome-wrap js-welcome-wrap"></div><div id="header_wrapper" class="header-wrap js-header-wrap"><div id="header" class="header cw"><div class="header__search-wrap"><a tabindex="-1" href="/?t=ffab" class="header__logo-wrap js-header-logo"><span class="header__logo js-logo-ddg">DuckDuckGo</span></a><div class="header__content header__search"><form id="search_form" class="search--adv search--header js-search-form" name="x" action="/"><input type="text" name="q" tabindex="1" autocomplete="off" id="search_form_input" class="search__input search__input--adv js-search-input" value="python request duckduckgo"><input id="search_form_input_clear" class="search__clear js-search-clear" type="button" tabindex="3" value="X"/><input id="search_button" class="search__button js-search-button" type="submit" tabindex="2" value="S" /><a id="search_dropdown" class="search__dropdown" href="javascript:;" tabindex="4"></a><div id="search_elements_hidden" class="search__hidden js-search-hidden"></div></form></div></div><div id="duckbar" class="zcm-wrap zcm-wrap--header is-noscript-hidden"></div></div><div class="header--aside js-header-aside"></div></div><div id="zero_click_wrapper" class="zci-wrap"></div><div id="vertical_wrapper" class="verticals"></div><div id="web_content_wrapper" class="content-wrap "><div class="serp__top-right js-serp-top-right"></div><div class="serp__bottom-right js-serp-bottom-right"><div class="js-feedback-btn-wrap"></div></div><div class="cw"><div id="links_wrapper" class="serp__results js-serp-results"><div class="results--main"><div class="search-filters-wrap"><div class="js-search-filters search-filters"></div></div><noscript><meta http-equiv="refresh" content="0;URL=/html?q=python%20request%20duckduckgo"><link href="/css/noscript.css" rel="stylesheet" type="text/css"><div class="msg msg--noscript"><p class="msg-title--noscript">You are being redirected to the non-JavaScript site.</p>Click <a href="/html/?q=python%20request%20duckduckgo">here</a> if it doesn't happen automatically.</div></noscript><div id="message" class="results--message"></div><div class="ia-modules js-ia-modules"></div><div id="ads" class="results--ads results--ads--main is-invisible js-results-ads"></div><div id="links" class="results is-invisible js-results"></div></div><div class="results--sidebar js-results-sidebar"><div class="sidebar-modules js-sidebar-modules"></div><div class="is-invisible js-sidebar-ads"></div></div></div></div></div><div id="bottom_spacing2"> </div></div><script type="text/javascript"></script><script type="text/JavaScript">function nrji() {nrj('/t.js?q=python%20request%20duckduckgo&l=us-en&s=0&dl=en&ct=US&ss_mkt=us&p_ent=&ex=-1');nrj('/d.js?q=python%20request%20duckduckgo&l=us-en&s=0&a=ffab&dl=en&ct=US&ss_mkt=us&vqd=3-149609696422854606330346289888770817762-151254838983446808561626137548835915940&p_ent=&ex=-1&sp=1');;};DDG.ready(nrji, 1);</script><script src="/g2379.js"></script><script type="text/javascript">DDG.page = new DDG.Pages.SERP({ showSafeSearch: 0, instantAnswerAds: false });</script><div id="z2"> </div><div id="z"></div></body></html>
You have to understand that the HTML may contain JavaScript that executes after the page is loaded which modifies the page content. So what you see visibly in a browser may not correspond to what you see in HTML loaded via requests
. If that is the case you probably need a different tool such as selenium
to drive an actual web browser.
Answered by Booboo on December 16, 2021
While this might not be an exact issue, duckduckgo is definitely not for bots and scraping its search content. Take a look at their robots.txt file. This file from websites tells you how to treat their website for crawlers vs users – what pages are allowed to be crawled and which ones can't be crawled.
From the looks of it, all of what you're trying to crawl is Disallowed
. There's a chance you're getting teapot as a response because that is their response to crawlers without permission.
If you're trying to learn about requests
, it might be better to avoid search engines. Most of the common ones that I know of disallow outside crawlers.
Answered by M Z on December 16, 2021
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP