Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

I need all the lines which contains the text '.mp4'. The Html file has no tag!

My code:

import urllib.request
import demjson
url = ('https://myurl')
content = urllib.request.urlopen(url).read()

<script type="text/javascript">
							/* <![CDATA[ */
															function getEmbed(width, height) {
									if (width && height) {
										return '<iframe width="' + width + '" height="' + height + '" src="https://www.ptrex.com/embed/33247" frameborder="0" allowfullscreen webkitallowfullscreen mozallowfullscreen oallowfullscreen msallowfullscreen></iframe>';
									}
									return '<iframe width="768" height="432" src="https://www.ptrex.com/embed/33247" frameborder="0" allowfullscreen webkitallowfullscreen mozallowfullscreen oallowfullscreen msallowfullscreen></iframe>';
								}
							
							var flashvars = {
																	video_id: '33247', 																	license_code: '$535555517668914', 																	rnd: '1537972655', 																	video_url: 'https://www.ptrex.com/get_file/4/996a9088fdf801992d24457cd51469f3f7aaaee6a0/33000/33247/33247.mp4/', 																	postfix: '.mp4', 																	video_url_text: '480p', 																	video_alt_url: 'https://www.ptrex.com/get_file/4/774833c428771edee2cf401ef2264e746a06f9f370/33000/33247/33247_720p.mp4/', 																	video_alt_url_text: '720p HD', 																	video_alt_url_hd: '1', 																	timeline_screens_url: '//di-iu49il1z.leasewebultracdn.com/contents/videos_screenshots/33000/33247/timelines/timeline_mp4/200x116/{time}.jpg', 																	timeline_screens_interval: '10', 																	preview_url: '//di-iu49il1z.leasewebultracdn.com/contents/videos_screenshots/33000/33247/preview.mp4.jpg', 																	skin: 'youtube.css', 																	bt: '1', 																	volume: '1', 																	hide_controlbar: '1', 																	hide_style: 'fade', 																	related_src: 'https://www.ptrex.com/related_videos_html/33247/', 																	adv_pre_vast: 'https://pt.ptawe.com/vast/v3?psid=ed_pntrexvb1&utm_source=bf1&utm_medium=network&ms_notrack=1', 																	lrcv: '1556867449254522707330811', 																	adv_pre_replay_after: '2', 																	embed: '1'															};
														var player_obj = kt_player('kt_player', 'https://www.ptrex.com/player/kt_player.swf?v=4.0.2', '100%', '100%', flashvars);
							/* ]]> */
						</script>
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
449 views
Welcome To Ask or Share your Answers For Others

1 Answer

You could use BeautifulSoup to extract the <script> tag, but you would still need an alternative approach to extract the information inside.

Some Python can be used to first extract flashvars and then pass this to demjson to convert the Javascript dictionary into a Python one. For example:

import demjson

content = """<script type="text/javascript">/* <![CDATA[ */ 
... 
...
</script>"""

script_var = content.split('var flashvars = ')[1]
script_var = script_var[:script_var.find('};') + 1]
data = demjson.decode(script_var)

print(data['video_url'])
print(data['video_alt_url'])

This would then display:

https://www.ptrex.com/get_file/4/996a9088fdf801992d24457cd51469f3f7aaaee6a0/33000/33247/33247.mp4/
https://www.ptrex.com/get_file/4/774833c428771edee2cf401ef2264e746a06f9f370/33000/33247/33247_720p.mp4/

demjson is an alternative JSON decoder which can be installed via PIP

pip install demjson

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
...