Python scraper_utils.pathify_url函数代码示例

OGeek|极客世界-中国程序员成长平台 › 门户 › 编程› Python›Python编程经验

原作者: [db:作者] 来自: [db:来源] 收藏邀请

本文整理汇总了Python中salts_lib.scraper_utils.pathify_url函数的典型用法代码示例。如果您正苦于以下问题：Python pathify_url函数的具体用法？Python pathify_url怎么用？Python pathify_url使用的例子？那么恭喜您, 这里精选的函数代码示例或许可以为您提供帮助。

在下文中一共展示了pathify_url函数的20个代码示例，这些例子默认根据受欢迎程度排序。您可以为喜欢或者感觉有用的代码点赞，您的评价将有助于我们的系统推荐出更棒的Python代码示例。

示例1: _get_episode_url

 def _get_episode_url(self, show_url, video):
     force_title = scraper_utils.force_title(video)
     title_fallback = kodi.get_setting('title-fallback') == 'true'
     norm_title = scraper_utils.normalize_title(video.ep_title)
     page_url = [show_url]
     too_old = False
     while page_url and not too_old:
         html = self._http_get(page_url[0], require_debrid=True, cache_limit=1)
         for _attr, post in dom_parser2.parse_dom(html, 'div', {'id': re.compile('post-\d+')}):
             if self.__too_old(post):
                 too_old = True
                 break
             if CATEGORIES[VIDEO_TYPES.TVSHOW] in post and show_url in post:
                 match = dom_parser2.parse_dom(post, 'a', req='href')
                 if match:
                     url, title = match[0].attrs['href'], match[0].content
                     if not force_title:
                         if scraper_utils.release_check(video, title, require_title=False):
                             return scraper_utils.pathify_url(url)
                     else:
                         if title_fallback and norm_title:
                             match = re.search('</strong>(.*?)</p>', post)
                             if match and norm_title == scraper_utils.normalize_title(match.group(1)):
                                 return scraper_utils.pathify_url(url)
             
         page_url = dom_parser2.parse_dom(html, 'a', {'class': 'nextpostslink'}, req='href')
         if page_url: page_url = [page_url[0].attrs['href']]

开发者ID:CYBERxNUKE，项目名称:xbmc-addon，代码行数:27，代码来源:2ddl_scraper.py

示例2: _get_episode_url

 def _get_episode_url(self, show_url, video):
     force_title = scraper_utils.force_title(video)
     title_fallback = kodi.get_setting('title-fallback') == 'true'
     norm_title = scraper_utils.normalize_title(video.ep_title)
     page_url = [show_url]
     too_old = False
     while page_url and not too_old:
         url = urlparse.urljoin(self.base_url, page_url[0])
         html = self._http_get(url, require_debrid=True, cache_limit=1)
         posts = dom_parser.parse_dom(html, 'div', {'id': 'post-\d+'})
         for post in posts:
             if self.__too_old(post):
                 too_old = True
                 break
             if CATEGORIES[VIDEO_TYPES.TVSHOW] in post and show_url in post:
                 match = re.search('<a\s+href="([^"]+)[^>]+>(.*?)</a>', post)
                 if match:
                     url, title = match.groups()
                     if not force_title:
                         if scraper_utils.release_check(video, title, require_title=False):
                             return scraper_utils.pathify_url(url)
                     else:
                         if title_fallback and norm_title:
                             match = re.search('</strong>(.*?)</p>', post)
                             if match and norm_title == scraper_utils.normalize_title(match.group(1)):
                                 return scraper_utils.pathify_url(url)
             
         page_url = dom_parser.parse_dom(html, 'a', {'class': 'nextpostslink'}, ret='href')

开发者ID:EPiC-APOC，项目名称:repository.xvbmc，代码行数:28，代码来源:2ddl_scraper.py

示例3: search

 def search(self, video_type, title, year, season=''):  # @UnusedVariable
     results = []
     search_url = '/search/' + urllib.quote_plus(title)
     html = self._http_get(search_url, require_debrid=True, cache_limit=1)
     if video_type == VIDEO_TYPES.TVSHOW:
         seen_urls = {}
         for _attr, post in dom_parser2.parse_dom(html, 'div', {'id': re.compile('post-\d+')}):
             if CATEGORIES[video_type] not in post: continue
             match = re.search('<span>\s*TAGS:\s*</span>\s*<a\s+href="([^"]+)[^>]+>([^<]+)', post, re.I)
             if match:
                 show_url, match_title = match.groups()
                 if show_url in seen_urls: continue
                 result = {'url': scraper_utils.pathify_url(show_url), 'title': scraper_utils.cleanse_title(match_title), 'year': ''}
                 seen_urls[show_url] = result
                 results.append(result)
     elif video_type == VIDEO_TYPES.MOVIE:
         norm_title = scraper_utils.normalize_title(title)
         headings = re.findall('<h2>\s*<a\s+href="([^"]+)[^>]+>(.*?)</a>', html)
         posts = [result.content for result in dom_parser2.parse_dom(html, 'div', {'id': re.compile('post-\d+')})]
         for heading, post in zip(headings, posts):
             if CATEGORIES[video_type] not in post or self.__too_old(post): continue
             post_url, post_title = heading
             meta = scraper_utils.parse_movie_link(post_title)
             full_title = '%s [%s] (%sp)' % (meta['title'], meta['extra'], meta['height'])
             match_year = meta['year']
             
             match_norm_title = scraper_utils.normalize_title(meta['title'])
             if (match_norm_title in norm_title or norm_title in match_norm_title) and (not year or not match_year or year == match_year):
                 result = {'url': scraper_utils.pathify_url(post_url), 'title': scraper_utils.cleanse_title(full_title), 'year': match_year}
                 results.append(result)
         
     return results

开发者ID:CYBERxNUKE，项目名称:xbmc-addon，代码行数:32，代码来源:2ddl_scraper.py

示例4: _get_episode_url

    def _get_episode_url(self, show_url, video):
        episode_pattern = 'href="([^"]+-s0*%se0*%s(?!\d)[^"]*)' % (video.season, video.episode)
        result = self._default_get_episode_url(show_url, video, episode_pattern)
        if result:
            return result

        url = urlparse.urljoin(self.base_url, show_url)
        html = self._http_get(url, cache_limit=2)
        fragment = dom_parser.parse_dom(html, "ul", {"class": "episode_list"})
        if fragment:
            ep_urls = dom_parser.parse_dom(fragment[0], "a", ret="href")
            ep_dates = dom_parser.parse_dom(fragment[0], "span", {"class": "episode_air_d"})
            ep_titles = dom_parser.parse_dom(fragment[0], "span", {"class": "episode_name"})
            force_title = scraper_utils.force_title(video)
            if not force_title and kodi.get_setting("airdate-fallback") == "true" and video.ep_airdate:
                for ep_url, ep_date in zip(ep_urls, ep_dates):
                    log_utils.log(
                        "Quikr Ep Airdate Matching: %s - %s - %s" % (ep_url, ep_date, video.ep_airdate),
                        log_utils.LOGDEBUG,
                    )
                    if video.ep_airdate == scraper_utils.to_datetime(ep_date, "%Y-%m-%d").date():
                        return scraper_utils.pathify_url(ep_url)

            if force_title or kodi.get_setting("title-fallback") == "true":
                norm_title = scraper_utils.normalize_title(video.ep_title)
                for ep_url, ep_title in zip(ep_urls, ep_titles):
                    ep_title = re.sub("<span>.*?</span>\s*", "", ep_title)
                    log_utils.log(
                        "Quikr Ep Title Matching: %s - %s - %s" % (ep_url, norm_title, video.ep_title),
                        log_utils.LOGDEBUG,
                    )
                    if norm_title == scraper_utils.normalize_title(ep_title):
                        return scraper_utils.pathify_url(ep_url)

开发者ID:EPiC-APOC，项目名称:repository.xvbmc，代码行数:33，代码来源:quikr_scraper.py

示例5: search

    def search(self, video_type, title, year, season=""):
        results = []
        norm_title = scraper_utils.normalize_title(title)
        if video_type == VIDEO_TYPES.MOVIE:
            if year:
                base_url = urlparse.urljoin(self.base_url, "/Film/")
                html = self._http_get(base_url, cache_limit=48)
                for link in self.__parse_directory(html):
                    if year == link["title"]:
                        url = urlparse.urljoin(base_url, link["link"])
                        for movie in self.__get_files(url, cache_limit=24):
                            match_title, match_year, _height, _extra = scraper_utils.parse_movie_link(movie["link"])
                            if (
                                not movie["directory"]
                                and norm_title in scraper_utils.normalize_title(match_title)
                                and (not year or not match_year or year == match_year)
                            ):
                                result = {"url": scraper_utils.pathify_url(url), "title": match_title, "year": year}
                                results.append(result)
        else:
            base_url = urlparse.urljoin(self.base_url, "/Serial/")
            html = self._http_get(base_url, cache_limit=48)
            for link in self.__parse_directory(html):
                if link["directory"] and norm_title in scraper_utils.normalize_title(link["title"]):
                    url = urlparse.urljoin(base_url, link["link"])
                    result = {"url": scraper_utils.pathify_url(url), "title": link["title"], "year": ""}
                    results.append(result)

        return results

开发者ID:henry73，项目名称:salts，代码行数:29，代码来源:farda_scraper.py

示例6: _get_episode_url

 def _get_episode_url(self, show_url, video):
     url = scraper_utils.urljoin(self.base_url, show_url)
     html = self._http_get(url, cache_limit=2)
     episode_pattern = 'href="([^"]+-s0*%se0*%s(?!\d)[^"]*)' % (video.season, video.episode)
     parts = dom_parser2.parse_dom(html, 'ul', {'class': 'episode_list'})
     fragment = '\n'.join(part.content for part in parts)
     result = self._default_get_episode_url(fragment, video, episode_pattern)
     if result: return result
     
     ep_urls = [r.attrs['href'] for r in dom_parser2.parse_dom(fragment, 'a', req='href')]
     ep_dates = [r.content for r in dom_parser2.parse_dom(fragment, 'span', {'class': 'episode_air_d'})]
     ep_titles = [r.content for r in dom_parser2.parse_dom(fragment, 'span', {'class': 'episode_name'})]
     force_title = scraper_utils.force_title(video)
     if not force_title and kodi.get_setting('airdate-fallback') == 'true' and video.ep_airdate:
         for ep_url, ep_date in zip(ep_urls, ep_dates):
             logger.log('Quikr Ep Airdate Matching: %s - %s - %s' % (ep_url, ep_date, video.ep_airdate), log_utils.LOGDEBUG)
             if video.ep_airdate == scraper_utils.to_datetime(ep_date, '%Y-%m-%d').date():
                 return scraper_utils.pathify_url(ep_url)
 
     if force_title or kodi.get_setting('title-fallback') == 'true':
         norm_title = scraper_utils.normalize_title(video.ep_title)
         for ep_url, ep_title in zip(ep_urls, ep_titles):
             ep_title = re.sub('<span>.*?</span>\s*', '', ep_title)
             logger.log('Quikr Ep Title Matching: %s - %s - %s' % (ep_url.encode('utf-8'), ep_title.encode('utf-8'), video.ep_title), log_utils.LOGDEBUG)
             if norm_title == scraper_utils.normalize_title(ep_title):
                 return scraper_utils.pathify_url(ep_url)

开发者ID:CYBERxNUKE，项目名称:xbmc-addon，代码行数:26，代码来源:quikr_scraper.py

示例7: _get_episode_url

 def _get_episode_url(self, show_url, video):
     force_title = scraper_utils.force_title(video)
     title_fallback = kodi.get_setting('title-fallback') == 'true'
     norm_title = scraper_utils.normalize_title(video.ep_title)
     page_url = [show_url]
     too_old = False
     while page_url and not too_old:
         url = scraper_utils.urljoin(self.base_url, page_url[0])
         html = self._http_get(url, require_debrid=True, cache_limit=1)
         headings = re.findall('<h2>\s*<a\s+href="([^"]+)[^>]+>(.*?)</a>', html)
         posts = [r.content for r in dom_parser2.parse_dom(html, 'div', {'id': re.compile('post-\d+')})]
         for heading, post in zip(headings, posts):
             if self.__too_old(post):
                 too_old = True
                 break
             if CATEGORIES[VIDEO_TYPES.TVSHOW] in post and show_url in post:
                 url, title = heading
                 if not force_title:
                     if scraper_utils.release_check(video, title, require_title=False):
                         return scraper_utils.pathify_url(url)
                 else:
                     if title_fallback and norm_title:
                         match = re.search('<strong>(.*?)</strong>', post)
                         if match and norm_title == scraper_utils.normalize_title(match.group(1)):
                             return scraper_utils.pathify_url(url)
             
         page_url = dom_parser2.parse_dom(html, 'a', {'class': 'nextpostslink'}, req='href')
         if page_url: page_url = [page_url[0].attrs['href']]

开发者ID:CYBERxNUKE，项目名称:xbmc-addon，代码行数:28，代码来源:ddlvalley_scraper.py

示例8: search

 def search(self, video_type, title, year, season=''):  # @UnusedVariable
     results = []
     if video_type == VIDEO_TYPES.TVSHOW and title:
         test_url = '/tv-show/%s/' % (scraper_utils.to_slug(title))
         test_url = scraper_utils.urljoin(self.base_url, test_url)
         html = self._http_get(test_url, require_debrid=True, cache_limit=24)
         posts = dom_parser2.parse_dom(html, 'div', {'id': re.compile('post-\d+')})
         if posts:
             result = {'url': scraper_utils.pathify_url(test_url), 'title': scraper_utils.cleanse_title(title), 'year': ''}
             results.append(result)
     elif video_type == VIDEO_TYPES.MOVIE:
         search_title = re.sub('[^A-Za-z0-9 ]', '', title.lower())
         html = self._http_get(self.base_url, params={'s': search_title}, require_debrid=True, cache_limit=1)
         norm_title = scraper_utils.normalize_title(title)
         for _attrs, post in dom_parser2.parse_dom(html, 'div', {'id': re.compile('post-\d+')}):
             match = re.search('<h\d+[^>]*>\s*<a\s+href="([^"]+)[^>]*>(.*?)</a>', post)
             if match:
                 post_url, post_title = match.groups()
                 if '/tv-show/' in post or self.__too_old(post): continue
                 post_title = re.sub('<[^>]*>', '', post_title)
                 meta = scraper_utils.parse_movie_link(post_title)
                 full_title = '%s [%s] (%sp)' % (meta['title'], meta['extra'], meta['height'])
                 match_year = meta['year']
                 
                 match_norm_title = scraper_utils.normalize_title(meta['title'])
                 if (match_norm_title in norm_title or norm_title in match_norm_title) and (not year or not match_year or year == match_year):
                     result = {'url': scraper_utils.pathify_url(post_url), 'title': scraper_utils.cleanse_title(full_title), 'year': match_year}
                     results.append(result)
         
     return results

开发者ID:CYBERxNUKE，项目名称:xbmc-addon，代码行数:30，代码来源:myddl_scraper.py

示例9: search

    def search(self, video_type, title, year, season=''):  # @UnusedVariable
        results = []
        if title:
            html = self._http_get(self.base_url, cache_limit=48)
            norm_title = scraper_utils.normalize_title(title)
            fragment = dom_parser2.parse_dom(html, 'div', {'class': 'container seo'})
            if fragment:
                match_year = ''
                for attrs, match_title in dom_parser2.parse_dom(fragment[0].content, 'a', {'class': 'link'}, req='href'):
                    if norm_title in scraper_utils.normalize_title(match_title) and (not year or not match_year or year == match_year):
                        result = {'url': scraper_utils.pathify_url(attrs['href']), 'title': scraper_utils.cleanse_title(match_title), 'year': match_year}
                        results.append(result)
                    
            for _attrs, table in dom_parser2.parse_dom(html, 'table'):
                for _attrs, td in dom_parser2.parse_dom(table, 'td'):
                    match_url = dom_parser2.parse_dom(td, 'a', req='href')
                    match_title = dom_parser2.parse_dom(td, 'div', {'class': 'searchTVname'})
                    match_year = dom_parser2.parse_dom(td, 'span', {'class': 'right'})
                    if match_url and match_title:
                        match_url = match_url[0].attrs['href']
                        match_title = match_title[0].content
                        match_year = match_year[0].content if match_year else ''
                    
                        if norm_title in scraper_utils.normalize_title(match_title) and (not year or not match_year or year == match_year):
                            result = {'url': scraper_utils.pathify_url(match_url), 'title': scraper_utils.cleanse_title(match_title), 'year': match_year}
                            results.append(result)

        return results

开发者ID:CYBERxNUKE，项目名称:xbmc-addon，代码行数:28，代码来源:tvrush_scraper.py

示例10: _get_episode_url

 def _get_episode_url(self, show_url, video):
     sxe = '(\.|_| )S%02dE%02d(\.|_| )' % (int(video.season), int(video.episode))
     force_title = scraper_utils.force_title(video)
     title_fallback = kodi.get_setting('title-fallback') == 'true'
     norm_title = scraper_utils.normalize_title(video.ep_title)
     try: airdate_pattern = video.ep_airdate.strftime('(\.|_| )%Y(\.|_| )%m(\.|_| )%d(\.|_| )')
     except: airdate_pattern = ''
     
     page_url = [show_url]
     too_old = False
     while page_url and not too_old:
         url = urlparse.urljoin(self.base_url, page_url[0])
         html = self._http_get(url, require_debrid=True, cache_limit=1)
         posts = dom_parser.parse_dom(html, 'div', {'id': 'post-\d+'})
         for post in posts:
             if self.__too_old(post):
                 too_old = True
                 break
             if CATEGORIES[VIDEO_TYPES.TVSHOW] in post and show_url in post:
                 match = re.search('<a\s+href="([^"]+)[^>]+>(.*?)</a>', post)
                 if match:
                     url, title = match.groups()
                     if not force_title:
                         if re.search(sxe, title) or (airdate_pattern and re.search(airdate_pattern, title)):
                             return scraper_utils.pathify_url(url)
                     else:
                         if title_fallback and norm_title:
                             match = re.search('</strong>(.*?)</p>', post)
                             if match and norm_title == scraper_utils.normalize_title(match.group(1)):
                                 return scraper_utils.pathify_url(url)
             
         page_url = dom_parser.parse_dom(html, 'a', {'class': 'nextpostslink'}, ret='href')

开发者ID:freeworldxbmc，项目名称:KAOSbox-Repo，代码行数:32，代码来源:2ddl_scraper.py

示例11: _get_episode_url

 def _get_episode_url(self, show_url, video):
     query = scraper_utils.parse_query(show_url)
     if 'id' in query:
         url = scraper_utils.urljoin(self.base_url, '/api/v2/shows/%s' % (query['id']))
         js_data = self._http_get(url, cache_limit=.5)
         if 'episodes' in js_data:
             force_title = scraper_utils.force_title(video)
             if not force_title:
                 for episode in js_data['episodes']:
                     if int(video.season) == int(episode['season']) and int(video.episode) == int(episode['number']):
                         return scraper_utils.pathify_url('?id=%s' % (episode['id']))
                 
                 if kodi.get_setting('airdate-fallback') == 'true' and video.ep_airdate:
                     for episode in js_data['episodes']:
                         if 'airdate' in episode:
                             ep_airdate = scraper_utils.to_datetime(episode['airdate'], "%Y-%m-%d").date()
                             if video.ep_airdate == (ep_airdate - datetime.timedelta(days=1)):
                                 return scraper_utils.pathify_url('?id=%s' % (episode['id']))
             else:
                 logger.log('Skipping S&E matching as title search is forced on: %s' % (video.trakt_id), log_utils.LOGDEBUG)
             
             if (force_title or kodi.get_setting('title-fallback') == 'true') and video.ep_title:
                 norm_title = scraper_utils.normalize_title(video.ep_title)
                 for episode in js_data['episodes']:
                     if 'name' in episode and norm_title in scraper_utils.normalize_title(episode['name']):
                         return scraper_utils.pathify_url('?id=%s' % (episode['id']))

开发者ID:CYBERxNUKE，项目名称:xbmc-addon，代码行数:26，代码来源:ororotv_scraper.py

示例12: _default_get_episode_url

    def _default_get_episode_url(self, html, video, episode_pattern, title_pattern='', airdate_pattern=''):
        logger.log('Default Episode Url: |%s|%s|' % (self.get_name(), video), log_utils.LOGDEBUG)
        if not html: return
        
        try: html = html[0].content
        except AttributeError: pass
        force_title = scraper_utils.force_title(video)
        if not force_title:
            if episode_pattern:
                match = re.search(episode_pattern, html, re.DOTALL | re.I)
                if match:
                    return scraper_utils.pathify_url(match.group(1))

            if kodi.get_setting('airdate-fallback') == 'true' and airdate_pattern and video.ep_airdate:
                airdate_pattern = airdate_pattern.replace('{year}', str(video.ep_airdate.year))
                airdate_pattern = airdate_pattern.replace('{month}', str(video.ep_airdate.month))
                airdate_pattern = airdate_pattern.replace('{p_month}', '%02d' % (video.ep_airdate.month))
                airdate_pattern = airdate_pattern.replace('{month_name}', MONTHS[video.ep_airdate.month - 1])
                airdate_pattern = airdate_pattern.replace('{short_month}', SHORT_MONS[video.ep_airdate.month - 1])
                airdate_pattern = airdate_pattern.replace('{day}', str(video.ep_airdate.day))
                airdate_pattern = airdate_pattern.replace('{p_day}', '%02d' % (video.ep_airdate.day))
                logger.log('Air Date Pattern: %s' % (airdate_pattern), log_utils.LOGDEBUG)

                match = re.search(airdate_pattern, html, re.DOTALL | re.I)
                if match:
                    return scraper_utils.pathify_url(match.group(1))
        else:
            logger.log('Skipping S&E matching as title search is forced on: %s' % (video.trakt_id), log_utils.LOGDEBUG)

        if (force_title or kodi.get_setting('title-fallback') == 'true') and video.ep_title and title_pattern:
            norm_title = scraper_utils.normalize_title(video.ep_title)
            for match in re.finditer(title_pattern, html, re.DOTALL | re.I):
                episode = match.groupdict()
                if norm_title == scraper_utils.normalize_title(episode['title']):
                    return scraper_utils.pathify_url(episode['url'])

开发者ID:CYBERxNUKE，项目名称:xbmc-addon，代码行数:35，代码来源:scraper.py

示例13: search

 def search(self, video_type, title, year):
     results = []
     norm_title = scraper_utils.normalize_title(title)
     if video_type == VIDEO_TYPES.MOVIE:
         if year:
             base_url = urlparse.urljoin(self.base_url, '/Film/')
             html = self._http_get(base_url, cache_limit=48)
             for link in self.__parse_directory(html):
                 if year == link['title']:
                     url = urlparse.urljoin(base_url, link['link'])
                     for movie in self.__get_files(url, cache_limit=24):
                         match_title, match_year, _height, _extra = scraper_utils.parse_movie_link(movie['link'])
                         if not movie['directory'] and norm_title in scraper_utils.normalize_title(match_title) and (not year or not match_year or year == match_year):
                             result = {'url': scraper_utils.pathify_url(url), 'title': match_title, 'year': year}
                             results.append(result)
     else:
         base_url = urlparse.urljoin(self.base_url, '/Serial/')
         html = self._http_get(base_url, cache_limit=48)
         for link in self.__parse_directory(html):
             if link['directory'] and norm_title in scraper_utils.normalize_title(link['title']):
                 url = urlparse.urljoin(base_url, link['link'])
                 result = {'url': scraper_utils.pathify_url(url), 'title': link['title'], 'year': ''}
                 results.append(result)
         
     return results

开发者ID:azumimuo，项目名称:family-xbmc-addon，代码行数:25，代码来源:farda_scraper.py

示例14: _get_episode_url

 def _get_episode_url(self, show_url, video):
     sxe = '.S%02dE%02d.' % (int(video.season), int(video.episode))
     force_title = scraper_utils.force_title(video)
     title_fallback = kodi.get_setting('title-fallback') == 'true'
     norm_title = scraper_utils.normalize_title(video.ep_title)
     try: ep_airdate = video.ep_airdate.strftime('.%Y.%m.%d.')
     except: ep_airdate = ''
     
     page_url = [show_url]
     too_old = False
     while page_url and not too_old:
         url = urlparse.urljoin(self.base_url, page_url[0])
         html = self._http_get(url, require_debrid=True, cache_limit=1)
         headings = re.findall('<h2>\s*<a\s+href="([^"]+)[^>]+>(.*?)</a>', html)
         posts = dom_parser.parse_dom(html, 'div', {'id': 'post-\d+'})
         for heading, post in zip(headings, posts):
             if self.__too_old(post):
                 too_old = True
                 break
             if CATEGORIES[VIDEO_TYPES.TVSHOW] in post and show_url in post:
                 url, title = heading
                 if not force_title:
                     if (sxe in title) or (ep_airdate and ep_airdate in title):
                         return scraper_utils.pathify_url(url)
                 else:
                     if title_fallback and norm_title:
                         match = re.search('<strong>(.*?)</strong>', post)
                         if match and norm_title == scraper_utils.normalize_title(match.group(1)):
                             return scraper_utils.pathify_url(url)
             
         page_url = dom_parser.parse_dom(html, 'a', {'class': 'nextpostslink'}, ret='href')

开发者ID:monicarero，项目名称:repository.xvbmc，代码行数:31，代码来源:ddlvalley_scraper.py

示例15: _get_episode_url

    def _get_episode_url(self, show_url, video):
        url = urlparse.urljoin(self.base_url, show_url)
        html = self._http_get(url, cache_limit=2)
        if html:
            force_title = scraper_utils.force_title(video)
            episodes = dom_parser.parse_dom(html, 'div', {'class': '\s*el-item\s*'})
            if not force_title:
                episode_pattern = 'href="([^"]*-[sS]%02d[eE]%02d(?!\d)[^"]*)' % (int(video.season), int(video.episode))
                match = re.search(episode_pattern, html)
                if match:
                    return scraper_utils.pathify_url(match.group(1))
                
                if kodi.get_setting('airdate-fallback') == 'true' and video.ep_airdate:
                    airdate_pattern = '%02d-%02d-%d' % (video.ep_airdate.day, video.ep_airdate.month, video.ep_airdate.year)
                    for episode in episodes:
                        ep_url = dom_parser.parse_dom(episode, 'a', ret='href')
                        ep_airdate = dom_parser.parse_dom(episode, 'div', {'class': 'date'})
                        if ep_url and ep_airdate:
                            ep_airdate = ep_airdate[0].strip()
                            if airdate_pattern == ep_airdate:
                                return scraper_utils.pathify_url(ep_url[0])

            if (force_title or kodi.get_setting('title-fallback') == 'true') and video.ep_title:
                norm_title = scraper_utils.normalize_title(video.ep_title)
                for episode in episodes:
                    ep_url = dom_parser.parse_dom(episode, 'a', ret='href')
                    ep_title = dom_parser.parse_dom(episode, 'div', {'class': 'e-name'})
                    if ep_url and ep_title and norm_title == scraper_utils.normalize_title(ep_title[0]):
                        return scraper_utils.pathify_url(ep_url[0])

开发者ID:monicarero，项目名称:repository.xvbmc，代码行数:29，代码来源:watchepisodes_scraper.py

示例16: _get_episode_url

 def _get_episode_url(self, season_url, video):
     url = urlparse.urljoin(self.base_url, season_url)
     html = self._http_get(url, cache_limit=2)
     if int(video.episode) == 1:
         return scraper_utils.pathify_url(url)
     else:
         pattern = 'location\.href=&quot;([^&]*season-%s[^/]*/%s)&quot;' % (video.season, video.episode)
         match = re.search(pattern, html)
         if match:
             return scraper_utils.pathify_url(match.group(1))

开发者ID:EPiC-APOC，项目名称:repository.xvbmc，代码行数:10，代码来源:hdmovie14_scraper.py

示例17: _get_episode_url

 def _get_episode_url(self, show_url, video):
     season_url = show_url + '-season-%s/' % (video.season)
     url = urlparse.urljoin(self.base_url, season_url)
     html = self._http_get(url, allow_redirect=False, cache_limit=.5)
     if html != '/':
         if int(video.episode) == 1:
             return scraper_utils.pathify_url(url)
         else:
             pattern = 'location\.href=&quot;([^&]*season-%s/%s)&quot;' % (video.season, video.episode)
             match = re.search(pattern, html)
             if match:
                 return scraper_utils.pathify_url(match.group(1))

开发者ID:c0ns0le，项目名称:YCBuilds，代码行数:12，代码来源:hdmovie14_scraper.py

示例18: search

 def search(self, video_type, title, year, season=''):
     results = []
     search_url = urlparse.urljoin(self.base_url, '/?s=%s')
     search_url = search_url % (urllib.quote(title))
     html = self._http_get(search_url, cache_limit=1)
     for item in dom_parser.parse_dom(html, 'h3', {'class': 'post-box-title'}):
         match = re.search('href="([^"]+)[^>]*>([^<]+)', item)
         if match:
             match_url, match_title_year = match.groups()
             is_season = re.search('Season\s+(\d+)$', match_title_year, re.I)
             if not is_season and video_type == VIDEO_TYPES.MOVIE or is_season and VIDEO_TYPES.SEASON:
                 match_year = ''
                 if video_type == VIDEO_TYPES.SEASON:
                     match_title = match_title_year
                     if season and int(is_season.group(1)) != int(season):
                         continue
                 else:
                     match = re.search('(.*?)\s+(\d{4})$', match_title_year)
                     if match:
                         match_title, match_year = match.groups()
                     else:
                         match_title = match_title_year
                         match_year = ''
     
                 if not year or not match_year or year == match_year:
                     result = {'url': scraper_utils.pathify_url(match_url), 'title': scraper_utils.cleanse_title(match_title), 'year': match_year}
                     results.append(result)
     return results

开发者ID:Stevie-Bs，项目名称:repository.xvbmc，代码行数:28，代码来源:pubfilm_scraper.py

示例19: search

 def search(self, video_type, title, year, season=''):
     results = []
     search_url = urlparse.urljoin(self.base_url, '/?s=')
     search_url += urllib.quote_plus(title)
     html = self._http_get(search_url, cache_limit=1)
     fragment = dom_parser.parse_dom(html, 'ul', {'class': '[^"]*listing-videos[^"]*'})
     if fragment:
         for match in re.finditer('href="([^"]+)[^>]*>(.*?)</a>', fragment[0]):
             url, match_title_year = match.groups('')
             match_title_year = re.sub('<span>|</span>', '', match_title_year)
             if re.search('S\d{2}E\d{2}', match_title_year): continue  # skip episodes
             match = re.search('(.*?)\s+\(?(\d{4})\)?', match_title_year)
             if match:
                 match_title, match_year = match.groups()
             else:
                 match_title = match_title_year
                 match_year = ''
             match_title = match_title.replace('&#8211;', '-')
             match_title = match_title.replace('&#8217;', "'")
             
             if (not year or not match_year or year == match_year):
                 result = {'url': scraper_utils.pathify_url(url), 'title': match_title, 'year': match_year}
                 results.append(result)
     
     return results

开发者ID:c0ns0le，项目名称:YCBuilds，代码行数:25，代码来源:viewmovies_scraper.py

示例20: search

    def search(self, video_type, title, year, season=''):
        results = []
        search_url = urlparse.urljoin(self.base_url, '/?s=')
        search_url += urllib.quote_plus(title)
        html = self._http_get(search_url, cache_limit=8)
        title_strip = [word.decode('utf-8') for word in TITLE_STRIP]
        for item in dom_parser.parse_dom(html, 'div', {'class': 'item'}):
            match_url = re.search('href="([^"]+)', item)
            match_title = dom_parser.parse_dom(item, 'span', {'class': 'tt'})
            if match_url and match_title:
                item_type = dom_parser.parse_dom(item, 'span', {'class': 'calidad2'})
                if item_type and item_type[0] in SEARCH_EXCLUDE: continue
                match_url = match_url.group(1)
                match_title = match_title[0]
                if 'SEZON' in match_title.upper(): continue

                year_frag = dom_parser.parse_dom(item, 'span', {'class': 'year'})
                if year_frag:
                    match_year = year_frag[0]
                else:
                    match_year = ''
                        
                match_title = ' '.join([word for word in match_title.split() if word.upper() not in title_strip])
                if (not year or not match_year or year == match_year):
                    result = {'url': scraper_utils.pathify_url(match_url), 'title': scraper_utils.cleanse_title(match_title), 'year': match_year}
                    results.append(result)
        
        return results

开发者ID:kevintone，项目名称:tdbaddon，代码行数:28，代码来源:dizifilmhd_scraper.py

注：本文中的salts_lib.scraper_utils.pathify_url函数示例由纯净天空整理自Github/MSDocs等源码及文档管理平台，相关代码片段筛选自各路编程大神贡献的开源项目，源码版权归原作者所有，传播和使用请参考对应项目的License；未经允许，请勿转载。

鲜花

握手

雷人

路过

鸡蛋

该文章已有0人参与评论

请发表评论

全部评论

专题导读

More+

10-27 六六分期app的软件客服如何联系？(六六分期

11-06 可心卡盟:win10系统火狐flash插件崩溃怎么

11-06 亲亲特价:怎么删除回收站图标

11-06 济南大学虚拟社区:鲁大师节能降温的具体办

11-06 xlueops.exe:无线网络安装向导

11-06 女斗合众国:win7系统cf与主机连接不稳定怎

11-06 0xc000022-[cf烟雾头]cf怎么调烟雾头

11-06 qizideyouhuo:应用程序无法正常启动0xc0000

11-06 ipz-185:win7系统vcf文件怎么打开

11-06 傻哥蹦迪:win10系统s4怎么打开usb调试

11-06 八神浩树gtaste:回收站清空了怎么恢复

11-06 妖尾之黑色守护:win10系统电脑没有1440x900

11-06 校园至尊魔王小说:win7系统浏览网页时字体

11-06 女斗合众国:win10系统访问共享文件夹提示请

11-06 tokyo hot n0654:恢复win7系统默认字体一招

11-06 雨酷仙境:设置win7系统转移临时文件夹腾出

11-06 阿穆纳伊之杖:win7系统开始菜单在右边还原

11-06 tunespotting:win10系统火狐flash插件总是

11-06 甘尔葛分析师：计谋网站seo关键词暴涨有什

11-06 蔡贵霖: 计谋网站seo关键词暴涨有什么秘密

11-06 博益网首页:ao3网页版进入不了解决方法

11-06 漏斗子专栏: 网站数据分析小白易懂精华篇

11-06 见证双虹怎么做:win7系统开启telnet命令的

11-06 颾狐蝶蜋:系统资源不足无法完成请求的服务

11-06 国光中学校歌:提交网站到alexa查询详细步骤

11-06 西安有情天:静态网页和动态网页的区别

11-06 红木雅尚斋:外部链接构造对网站的好处

11-06 前官礼遇：防止域名劫持–增强域安全性的10

11-06 密传二转答案: 中文分词算法有哪些

11-06 金泉家园邮编:百度快照劫持的表现及应对方

Python scraper_utils.urljoin函数代码示例发布时间：2022-05-27

Python scraper_utils.parse_json函数代码示例发布时间：2022-05-27

Python util.grid_equal函数代码示例

1 Python 入门教程

Python入门教程 Python 是一种解释型、面向对象、动态数据类型的高级程序设计语言。 P

阅读：13813|2022-01-22

2 Python wikiutil.getFrontPage函数代码示例

Python wikiutil.getFrontPage函数代码示例

阅读：10207|2022-05-24

3 Python 简介

Python 简介 Python 是一个高层次的结合了解释性、编译性、互动性和面向对象的脚本

阅读：4093|2022-01-22

4 Python tests.group函数代码示例

Python tests.group函数代码示例

阅读：4045|2022-05-27

5 Python util.check_if_user_has_permission

Python util.check_if_user_has_permission函数代码示例

阅读：3845|2022-05-27

6 Python 操练实例98

Python 练习实例98 Python 100例题目：从键盘输入一个字符串，将小写字母全部转换成大

阅读：3515|2022-01-22

7 Python 环境搭建

Python 环境搭建本章节我们将向大家介绍如何在本地搭建 Python 开发环境。 Py

阅读：3032|2022-01-22

8 Python output.darkgreen函数代码示例

Python output.darkgreen函数代码示例

阅读：2656|2022-05-25

9 Python 基础语法

Python 基础语法 Python 语言与 Perl，C 和 Java 等语言有许多相似之处。但是，也

阅读：2651|2022-01-22

10 Python 中文编码

Python 中文编码前面章节中我们已经学会了如何用 Python 输出 Hello, World!，英文没

阅读：2303|2022-01-22

客服电话

电子邮件

Python scraper_utils.pathify_url函数代码示例

示例1: _get_episode_url

示例2: _get_episode_url

示例3: search

示例4: _get_episode_url

示例5: search

示例6: _get_episode_url

示例7: _get_episode_url

示例8: search

示例9: search

示例10: _get_episode_url

示例11: _get_episode_url

示例12: _default_get_episode_url

示例13: search

示例14: _get_episode_url

示例15: _get_episode_url

示例16: _get_episode_url

示例17: _get_episode_url

示例18: search

示例19: search

示例20: search

请发表评论

全部评论

上一篇：

下一篇：

Python util.grid_equal函数代码示例

Python util.get_worker_name函数代码示例

Python util.get_webmention_target函数代

Python util.get_uuid函数代码示例

Python util.get_type_by_name函数代码示例

Python util.grid_equal函数代码示例

Python util.get_worker_name函数代码示例

Python util.get_webmention_target函数代

Python util.get_uuid函数代码示例

Python util.get_type_by_name函数代码示例

Python util.get_stdout函数代码示例

关于我们

产品与服务

解决方案

139-2527-9053