年初,用 PHP 寫了這個,[PHP] 使用官方 Plurk API 實作簡單的機器人 - 靠機器人救 Karma!以 Yahoo News 為例,年終時,給他拿來當作 Android 程式的練習題目,結果弄了半天,發現 Java 語法熟練度很差,因此乾脆跑回去練習 Java 了!需搞懂的就是如何使用 Regular Expression 和網路的連線處理。
簡易範例:
import java.io.*;
import java.util.regex.*;
import java.net.HttpURLConnection;
import java.net.URL;
class Test
{
public static void report( String message )
{
System.out.println( "[Report] " + message );
}
public static void main(String argv[])
{
HttpURLConnection con = null;
try
{
//*
URL url = new URL("http://tw.yahoo.com");
con = (HttpURLConnection) url.openConnection();
con.setReadTimeout(10000);
con.setConnectTimeout(15000);
con.setRequestMethod("GET" );
con.addRequestProperty("User-Agent","Mozilla/5.0 (Windows; U; Windows NT 5.2; en-GB; rv:1.9.2.9) Gecko/20100824 Firefox/3.6.9");
con.setDoInput(true);
con.connect();
BufferedReader reader = new BufferedReader(new InputStreamReader(con.getInputStream(), "UTF-8" ));
// */
// BufferedReader reader = new BufferedReader(new InputStreamReader( new FileInputStream("fetch.html"), "UTF-8" ));
String n , result = "";
while( ( n = reader.readLine() ) != null )
result += n;
// BufferedWriter out = new BufferedWriter( new FileWriter("fetch.html") );out.write(result);out.close();
// System.out.println( "Result:" + result );
String pattern;
int at;
pattern = "<label class=\"img-border clearfix\">";
if( ( at = result.indexOf(pattern) ) < 0 )
{
report( "format error 1" );
return;
}
result = result.substring( at );
pattern = "<ol class=\"newsad clearfix\">";
if( ( at = result.indexOf(pattern) ) < 0 )
{
report( "format error 2" );
return;
}
result = result.substring( 0 , at );
pattern = "<h3[^>]*>[^<]*<a href=\"(.*?)\"[^>]*>(.*?)</a></h3>";
Pattern p = Pattern.compile( pattern , Pattern.CASE_INSENSITIVE | Pattern.DOTALL );
Matcher m = p.matcher( result );
while( m.find() )
{
report( "\n==== Get === \n" + m.group() + "\n" );
report( "URL: " + m.group(1) );
report( "Title: " + m.group(2) );
}
}
catch( Exception e )
{
report( "Error:" + e );
}
finally
{
if ( con != null )
con.disconnect();
}
}
}
執行結果:
$ javac Test.java && java Test
[Report]
==== Get ===
<h3><a href="news/a/h1/t/*http://tw.news.yahoo.com/article/url/d/a/101214/5/2iy8o.html" title="最新!美元弱 新台幣升破30">最新!美元弱 新 台幣升破30</a></h3>
[Report] URL: news/a/h1/t/*http://tw.news.yahoo.com/article/url/d/a/101214/5/2iy8o.html
[Report] Title: 最新!美元弱 新台幣升破30
[Report]
==== Get ===
<h3><a href="news/a/h2/t/*http://tw.news.yahoo.com/article/url/d/a/101214/2/2iy6j.html" title="情人多愛你 手碰觸方式露餡">情人多愛你 手碰 觸方式露餡</a></h3>
[Report] URL: news/a/h2/t/*http://tw.news.yahoo.com/article/url/d/a/101214/2/2iy6j.html
[Report] Title: 情人多愛你 手碰觸方式露餡
說真的,細算的話,五年前 Java 是我最常用的語言,至少用了一年多,但如今什麼都忘光光了!
沒有留言:
張貼留言