Unix & Linux Asked by Zombo on January 5, 2022
If you put this link in a browser:
https://unix.stackexchange.com/q/453740#453743
it returns this:
https://unix.stackexchange.com/questions/453740/installing-busybox-for-ubuntu#453743
However cURL drops the Hash:
$ curl -I https://unix.stackexchange.com/q/453740#453743
HTTP/2 302
cache-control: no-cache, no-store, must-revalidate
content-type: text/html; charset=utf-8
location: /questions/453740/installing-busybox-for-ubuntu
Does cURL have an option to keep the Hash with the resultant URL? Essentially I
am trying to write a script that will resolve URLs like a browser – this is what
I have so far but it breaks if the URL contains a Hash:
$ set https://unix.stackexchange.com/q/453740#453743
$ curl -L -s -o /dev/null -w %{url_effective} "$1"
https://unix.stackexchange.com/questions/453740/installing-busybox-for-ubuntu
Curl download whole pages.
A #
points to a fragment.
Both are not compatible.
The symbol #
is used at the end of a web page link to mark a position inside a whole web page.
...convention called "fragment URLs" to refer to anchors within an HTML document.
What is it when a link has a pound "#" sign in it
It's a "fragment" or "named anchor". You can you use to link to part of a document.
Wikipedia: Uniform Resource Locator (URL)
An optional fragment component preceded by an hash (#). The fragment contains a fragment identifier providing direction to a secondary resource, such as a section heading in an article identified by the remainder of the URI. When the primary resource is an HTML document, the fragment is often an id attribute of a specific element, and web browsers will scroll this element into view.
Its main use is to move the "presentation layer" (what is viewed) to the start of an item.
There is no "presentation layer" in curl, its goal is to download whole pages, not parts or fragments of pages. Therefore, there is no use for a "fragment" marker in curl. It is simply ignored by curl.
Re-append the tag to the (redirected) link:
originallink='https://unix.stackexchange.com/q/453740#453743'
wholepage=$(curl -Lso /dev/null -w %{url_effective} "$originallink")
if [ "$originallink" != "${originallink##*#}" ]; then
newlink=$wholepage#${originallink##*#}
else
echo "link contains no segment"
newlink="$wholepage"
fi
echo "$newlink"
Will print:
https://unix.stackexchange.com/questions/453740/installing-busybox-for-ubuntu#453743
A quite faster solution is to not download the page. It is being redirected to /dev/null
anyway. By removing the -L
option and asking what would be the link if the (first) redirect were followed. The first redirect works in this case and most others.
wholepage=$(curl -so /dev/null -w %{redirect_url} "$originallink")
Answered by ImHere on January 5, 2022
According to this thread on the curl
website titled: Re: How to send fragment part of URL? the hashmark is meant for the browser and not the server, hence why curl
is truncating it.
The fragment part of a URI is not meant to be sent in the HTTP request - it is used to identify a specific section in the resource that will be fetched by using the particular URI. If you want to force #-letter into the request I think encoding it sounds like a perfect idea.
Looking I did not see any method for curl
to persist it beyond encoding it as %23
, which I don't think is what you want.
Since it's the client that's maintaining the string after the hashmark, I'd "lean into it" and simply parse it out and then re-append it to the returned URL from curl
as a true browser client would do it:
$ set 'https://unix.stackexchange.com/q/453740#453743'
$ echo "$(curl -I -L -s -o /dev/null -w %{url_effective} "$1")#$(echo "$1" | cut -d"#" -f2)"
https://unix.stackexchange.com/questions/453740/installing-busybox-for-ubuntu#453743
Answered by slm on January 5, 2022
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP