統計数理研究所共同研究「Rの整備と利用」研究会開催のお知らせです。(前 にもお知らせしましたが、未定だった英語講演のタイトルおよびアブストラク トを付けました。) また、Professor Ripley と Dr. 後藤には日を変えて追加の講演をお願いでき ることになりました。どの講演も参加は無料で自由ですので、皆様の多数のご 参加をお願い申し上げます。 <<追加講演>> ● "Visualization for data mining" by Brian Ripley (Univ. of Oxford, UK) 日 時: 2006年12月7日(木) 10:30 - 11:30 (12:00 まで延長可) 場 所: 統計数理研究所 新館2階 研修室 Abstract: The talk will start with an introduction to data mining, and the types of data and questions that occur in that field. It will then consider ways to visualize such high-dimensional datasets via for example projection pursuit and multidimensional scaling, and show some examples from real-world consulting problems using R and GGobi. ● "GotoBLAS チュートリアル" by 後藤和茂 (Univ. of Texas, US) 日 時: 2006年12月11日(月) 10:30 - 11:30 (12:00 まで延長可) 場 所: 統計数理研究所 新館2階 研修室 Abstract: GotoBLAS は複数のアーキテクチャをサポートする最適化 BLAS の実装の 1 つ であるが、GotoBLAS 上で施されている種々の最適化の手法ついてはあまり知 られていない。そこで、本講演では、最適化を行う上での基本的な考え方を解 説するとともに、BLAS の Level 1 〜 Level 3 の特性及びこれらの関数を呼 び出す場合の注意点等に関して議論を行う。 世話人: 間瀬茂 (東京工業大学) 中野純司 (統計数理研究所) ----------------------------------------------------- 統計数理研究所共同研究「Rの整備と利用」研究会 日 時: 2006年12月8日(金) 9:30 - 17:00 2006年12月9日(土) 9:30 - 16:20 場 所: 統計数理研究所 講堂 地図 http://www.ism.ac.jp/access/index_j.html 参 加 費 用: 無料 使 用 言 語: 12月8日は英語、9日は日本語 ■ プログラム ●12月8日 (各 talk の abstract が下にあります) 09:30 - 10:30 Brian Ripley (University of Oxford, UK) Title: Software for Statistical Developments 10:45 - 11:45 Kazushige Goto (The University of Texas, US) Title: Various optimization and performance tips for processors 13:30 -14:30 Stefano Iacus (Universita degli Studi di Milano, Italy) Title: Two R specific optimization techniques for speed and data management 14:45 - 15:45 Sungwoo Park (Pohang University of Science and Technology, Korea) Title: A critique of R from the perspective of programming language theory 16:00 - 17:00 Junji Nakano (ISM, Japan) and Ei-ji Nakama(COM-ONE Inc., Japan) Title: R on super-computers at ISM ( 18:00 - Dinner) ●12月9日 09:30 - 10:15 舟尾暢男 武田薬品工業(株)医薬開発本部 日本開発センター 統計解析部 統計グループ 「Rでデータハンドリング 〜 データフレーム 30 分クッキング〜」 10:15 - 11:00 久保拓弥 北海道大学・地球環境科学研究院・環境生物科学部門 陸域生態学分野 「MCMC 計算まわりでさまよう R ユーザー」 11:00 - 11:45 鈴木了太 (株) ef-prime 「Rで起業!フリーソフトウェアとデータ分析ビジネスの現在」 (11:45 - 13:00 昼休み) 13:00 - 13:45 小笠原理(国立遺伝学研究所 生命情報・DDBJ研究センター) 服部恵美((株)情報数理研究所) 三十尾潔高((株)情報数理研究所)情報数理研究所 「R graphical manualsの開発と今後の展開」 13:45 - 14:30 牧山文彦 特定医療法人敬愛会 ちばなクリニック 健康管理センター 「GoogleEarth とR言語」 (14:30 - 14:50 休憩) 14:50 - 15:35 谷村晋 長崎大学 熱帯医学研究所 社会環境分野 「地理空間分析のためのベクタデータモデル共通基盤 − sp パッケージのクラスとメソッド−」 15:35 - 16:20 間瀬茂 東京工業大学 情報理工学研究科 数理・計算科学専攻 「R 紹介のためのオープンソース原稿」 ■ Abstracts of talks on 8 December: Title: Software for Statistical Developments Speaker: Brian Ripley To make statistical methodology available to end-users, especially to those outside statistics, we need to make available software to implement the method. That software should ideally be readily available, easy to use, flexible (as R is) and work correctly (and R has a great record for rapidly fixing bugs). The talk will discuss the process of moving new statistical methodology from ideas to practical implementation(s) (with R providing a near-idea vehicle for a reference implementation), illustrated by some case studies of state-of-art statistical analyses made possible by having code in S or R. Title: Various optimization and performance tips for processors Speaker: Kazushige Goto Most developers don't like doing optimization, because they believe it's a compiler's job, or optimization makes their code dirty. Unfortunately they confused algorithmic optimization and instruction optimization. Actually you don't have to do instruction optimizations which make your code dirty. Instead you need to think good algorithm to avoid various traps that CPU hates. This talk will discuss the key to perform your application better and how to avoid wrong coding. Title: Two R specific optimization techniques for speed and data management Speaker: Stefano Iacus There are two aspects of R programming that can make the implementation of some algorithm non efficient. One is related to the interactive "function based" OOP design of the R language (in contrast to "class based" OOP languages like Java, etc.) and the other one is related to the fact that R normally stores its objects in memory (although something new is on the horizon). In Monte Carlo analysis the natural need is to iterate some procedure which means make use a ``for'' loop. Because of F-OOP this might be too inefficient with respect to speed. On the other hand, updating big objects (data frames, distance matrixes, DNA sequences data) stored in memory might also become inefficient under some circumstances for the reason of speed and memory management. With the help of two concrete example packages we present two natural efficient ways to approach the above mentioned situations which may be of general interest to R (advanced) users. Title: A critique of R from the perspective of programming language theory Speaker: Sungwoo Park R is a programming system which provides rich support for statistical computing and high level graphics. Despite its popularity in the statistics community, however, R, as a programming language, has quite a few flaws in its design. For example, R allows the lazy evaluation strategy in the presence of computational effects, such as assignments, vector updates, and graphic outputs, with no provisions for the resultant strange semantics. In fact, combining the lazy evaluation strategy with computational effects in an unobtrusive way has been one of the key research problems in the programming language community for many years. The definition of R is also far from acceptable from the viewpoint of programming language theory. For example, specific implementation strategies are taken as part of the definition, while part of the definition is delegated to implementation strategies. In this talk, we give a critique of R from the perspective of programming language theory. We first criticize negative aspects of R and then highlight positive aspects of R. We analyze R as a programming language rather than as a programming system, thereby focusing on its semantic elements rather than its statistics/graphics library. As an alternative linguistic framework to R, we propose functional languages, which provide all those features provided by R and also come with formal semantics. Title: R on super-computers at ISM Speakers: Junji Nakano and Ei-ji Nakama The institute of statistical mathematics (ISM) is a (national) research institute for statistical sciences and provide several super- computer systems for statistical research by Japanese statisticians. We use R on these super-computers and have made several parallel computing functions available on it. These functions include parallel BLAS replacements such as ATLAS and GOTO linrary, and snow package for implementing parallel R functions using MPI and other distributed computing techniques. We also implement a Web environment to use parallel R easily.