データベースな皆さま
以前よりは、コロナがおさまり(東京は連続9週間コロナが増えています!)、海外とのやりとりも増えているかと存じます(小生は行くのを控えておりますが)
9月11日 16:30- 場所駒場
zoom配信も致します。zoomも現地もご希望の方は、yoko@tkl.iis.u-tokyo.ac.jp
にご連絡ください。お申込みされた方にzoomアドレスをお送りします。
喜連川優
-----
Information Integration in Social Science Research: Advances with LLMs in the NAIP Project
Abstract—NAIP is a collaborative project between social scientists and computer scientists to facilitate and promote data-driven research on nonprofit organizations. A major thrust of NAIP is the treatment of socioeconomic datasets such as IRS from 990 and census data for statistical analysis on nonprofits. This paper describes recent advances and experiences using Large Language Models (LLMs) to facilitate data cleaning, modeling, and automated SQL query generation on NAIP datasets. With non-trivial prompt engineering, LLMs become capable of generating useful SQL queries, but significant efforts are still needed for evaluation and validation.