Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
310 views
in Technique[技术] by (71.8m points)

java - Cant establish Jsoup connection when passing URL String in a certain way

i have a really strange problem.

I am running a spring application where i basicly just generate some threads and then try to establish a connection to a website to extract the status code of the response within those threads. Nothing special, but i have encountered a problem that really confuses me.

I have following code

@Override
    public void run() {

        Document document;
        Connection.Response response;
        String link = "https://lu.vpbank.com/htm/752/de_LU/Stellenangebote.htm";
        System.out.println(link);
        System.out.println(this.site.getLink());

        //Is working fine
        try {
            response = Jsoup.connect(link).followRedirects(false).ignoreHttpErrors(true).execute();
            System.out.println(response.statusCode());
        } catch (IOException e) {
            e.printStackTrace();
        }

        //Is not working
        try {
            response = Jsoup.connect(this.site.getLink()).followRedirects(false).ignoreHttpErrors(true).execute();
            System.out.println(response.statusCode());
        } catch (IOException e) {
            e.printStackTrace();
        }
    }

The thing is the first attempt to create the connection is fine because i get declare the content of the string within the method.

On the second attempt i get the URL String from an object which i have created previousely and fetched the url from a database. This throws an error.....

The console output is:

https://lu.vpbank.com/htm/752/de_LU/Stellenangebote.htm
404
https://www.vpbank.lu/htm/752/de_LU/Stellenangebote.htm
javax.net.ssl.SSLException: Connection reset
    at java.base/sun.security.ssl.Alert.createSSLException(Alert.java:127)
    at java.base/sun.security.ssl.TransportContext.fatal(TransportContext.java:369)
    at java.base/sun.security.ssl.TransportContext.fatal(TransportContext.java:312)
    at java.base/sun.security.ssl.TransportContext.fatal(TransportContext.java:307)
    at java.base/sun.security.ssl.SSLSocketImpl.handleException(SSLSocketImpl.java:1680)
    at java.base/sun.security.ssl.SSLSocketImpl$AppInputStream.read(SSLSocketImpl.java:1054)
    at java.base/java.io.BufferedInputStream.fill(BufferedInputStream.java:244)
    at java.base/java.io.BufferedInputStream.read1(BufferedInputStream.java:284)
    at java.base/java.io.BufferedInputStream.read(BufferedInputStream.java:343)
    at java.base/sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:754)
    at java.base/sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:689)
    at java.base/sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:713)
    at java.base/sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1623)
    at java.base/sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1528)
    at java.base/java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:527)
    at java.base/sun.net.www.protocol.https.HttpsURLConnectionImpl.getResponseCode(HttpsURLConnectionImpl.java:308)
    at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:736)
    at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:707)
    at org.jsoup.helper.HttpConnection.execute(HttpConnection.java:297)
    at net.candidatis.tierone.crawls.careersite.CrawlableCareerBasic.run(CrawlableCareerBasic.java:48)
    at net.candidatis.tierone.controllers.TestController.testCrawl(TestController.java:32)
    at net.candidatis.tierone.TieroneApplication.run(TieroneApplication.java:36)
    at org.springframework.boot.SpringApplication.callRunner(SpringApplication.java:804)
    at org.springframework.boot.SpringApplication.callRunners(SpringApplication.java:788)
    at org.springframework.boot.SpringApplication.run(SpringApplication.java:333)
    at org.springframework.boot.SpringApplication.run(SpringApplication.java:1309)
    at org.springframework.boot.SpringApplication.run(SpringApplication.java:1298)
    at net.candidatis.tierone.TieroneApplication.main(TieroneApplication.java:27)
    Suppressed: java.net.SocketException: Broken pipe
        at java.base/sun.nio.ch.NioSocketImpl.implWrite(NioSocketImpl.java:420)
        at java.base/sun.nio.ch.NioSocketImpl.write(NioSocketImpl.java:440)
        at java.base/sun.nio.ch.NioSocketImpl$2.write(NioSocketImpl.java:826)
        at java.base/java.net.Socket$SocketOutputStream.write(Socket.java:1051)
        at java.base/sun.security.ssl.SSLSocketOutputRecord.encodeAlert(SSLSocketOutputRecord.java:82)
        at java.base/sun.security.ssl.TransportContext.fatal(TransportContext.java:400)
        ... 26 more
Caused by: java.net.SocketException: Connection reset
    at java.base/sun.nio.ch.NioSocketImpl.implRead(NioSocketImpl.java:323)
    at java.base/sun.nio.ch.NioSocketImpl.read(NioSocketImpl.java:350)
    at java.base/sun.nio.ch.NioSocketImpl$1.read(NioSocketImpl.java:803)
    at java.base/java.net.Socket$SocketInputStream.read(Socket.java:981)
    at java.base/sun.security.ssl.SSLSocketInputRecord.read(SSLSocketInputRecord.java:478)
    at java.base/sun.security.ssl.SSLSocketInputRecord.readHeader(SSLSocketInputRecord.java:472)
    at java.base/sun.security.ssl.SSLSocketInputRecord.bytesInCompletePacket(SSLSocketInputRecord.java:70)
    at java.base/sun.security.ssl.SSLSocketImpl.readApplicationRecord(SSLSocketImpl.java:1434)
    at java.base/sun.security.ssl.SSLSocketImpl$AppInputStream.read(SSLSocketImpl.java:1038)
    ... 22 more

As we can see in the console output the url is identically.

site is just a simple object that i create before launching the threads.

import lombok.Data;

@Data
public class Site {
    private final String link;
}

Anyone any idea what might be the cause of this error ?


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

As we can see in the console output, the URL is identical.

But, they are not identical. They have different hostnames, which is why you're getting different behaviors:

https://lu.vpbank.com/htm/752/de_LU/Stellenangebote.htm https://www.vpbank.lu/htm/752/de_LU/Stellenangebote.htm

In a browser, the second redirects to the first. I'm guessing it has a different TLS setup, or may be validating the connection differently (has some required headers?) and that's why you're getting the connection reset error. But, that's a different issue.

(As an aside, thank you for providing enough detail including printing the actual URL you're trying to visit - makes it easy to help with a fresh set of eyes!)


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...