Hi,
I’m trying to open a web page running a line like this:
urllib2.urlopen('http://www.webpage/.../.../#fragment-identifier')
and I get ‘‘The page you requested could not be found…’’
I have to open pages with fragment identifier,
Does anyone know how to do?
Thanks
OpenSuse 11.4 64bit
Python 2.7
fabio g wrote:
> I’m trying to open a web page running a line like this:
>
> Code:
> --------------------
>
> urllib2.urlopen(‘http://www.webpage/.../.../#fragment-identifier’)
>
> --------------------
>
> and I get ‘‘The page you requested could not be found…’’
>
> I have to open pages with fragment identifier,
> Does anyone know how to do?
There’s no / before a fragment identifier. It follows directly after the
name of the page
There’s no / before a fragment identifier. It follows directly after the
name of the page
There is / before a fragment identifier!
See the example below:
>>> c = urlparse('http://www.webpage/.../.../#Myfragment-identifier')
>>> c
ParseResult(scheme='http', netloc='www.webpage', path='/.../.../', params='', query='', fragment='Myfragment-identifier')
Do you know how to open web pages with fragment identifier using python?
fabio g wrote:
> There’s no / before a fragment identifier. It follows directly after
> the
> name of the page
>
>
> There is / before a fragment identifier!
> See the example below:
No. The fragment identifier relates to an anchor in an HTML resource.
The HTML resource has a name and such names do not end with /.
A URL ending with a / is a path-part only. It needs resolving, typically
by adding index.html to the end, before it can identify a resource.
Also, fragment identifiers are client-side identifiers. They are not
passed to the web resource server.
http://en.wikipedia.org/wiki/Fragment_identifier
> Do you know how to open web pages with fragment identifier using
> python?
No - I don’t know the functions/methods to use in Python.
But I expect the procedure is the same as in other languages:
(1) remove the fragment identifier from the URL
(2) send the URL to the server
(3) receive the resource value from the server
(4) do whatever you want with the resource, applying the fragment
identifier as appropriate
I have revisited my problem,
I can trying to open a web page like this:
urllib.urlopen('http://www.webpage.com/.../.../#fragment').read()
and it return ‘page not found…’
I opened the page without #fragment and it return a main page. A part of main page is reported below:
<td class="act"><a class="tabFrag" href="/.../.../#fragment" title="Frag"></a></td></tr>
<a href="#fragment" onclick="tab_subContent.select('3');">Frag</a>
I’d like to get an HTML file containing the table contained into #fragment.
Thanks
fabio g wrote:
> I have revisited my problem,
>
> I can trying to open a web page like this:
>
> Code:
> --------------------
>
> urllib.urlopen(‘http://www.webpage.com/.../.../#fragment’).read()
>
> --------------------
>
> and it return ‘page not found…’
>
> I opened the page without #fragment and it return a main page. A part
> of main page is reported below:
>
> Code:
> --------------------
>
> <td class=“act”><a class=“tabFrag” href="/…/…/#fragment" title=“Frag”></a></td></tr>
>
> <a href="#fragment" onclick=“tab_subContent.select(‘3’);”>Frag</a>
>
> --------------------
>
> I’d like to get an HTML file containing the table contained into
> #fragment.
You’re asking in the wrong place. This is the networking-internet forum,
but you are asking a Python programming question. So the appropriate
forum would be the programming-scripting forum.
It’s possible you might get an answer there - not from me because I’m
not a Python expert - but I think you might have more luck elsewhere.
Either in a Python-specific place or a place specific to the page you
are trying to parse.