データベースな皆さま
以前よりは、コロナがおさまり(東京は連続9週間コロナが増えています!)、海外とのやりとりも増えているかと存じます(小生は行くのを控えておりますが)
9月11日 16:30- 場所駒場
zoom配信も致します。zoomも現地もご希望の方は、yoko(a)tkl.iis.u-tokyo.ac.jp
にご連絡ください。お申込みされた方にzoomアドレスをお送りします。
喜連川優
-----
Information Integration in Social Science Research: Advances with LLMs
in the NAIP Project
Abstract—NAIP is a collaborative project between social scientists and
computer scientists to facilitate and promote data-driven research on
nonprofit organizations. A major thrust of NAIP is the treatment of
socioeconomic datasets such as IRS from 990 and census data for
statistical analysis on nonprofits. This paper describes recent advances
and experiences using Large Language Models (LLMs) to facilitate data
cleaning, modeling, and automated SQL query generation on NAIP datasets.
With non-trivial prompt engineering, LLMs become capable of generating
useful SQL queries, but significant efforts are still needed for
evaluation and validation.