Salient Object Detection

はじめに

　今回はSalient Object Detectionを簡単に解説し、それを応用したPythonライブラリを紹介する。

Salient Object Detectionとは

　Salientとは「目立つ」という意味であり、人間が画像を見たとき一番注目するであろう物体を検出する技術をSalient Object Detection（SOD）と呼ぶ。

　左図（こちらから引用）は、あるアルゴリズムによりSODを実行した結果である。左側の列が元画像、右側の列が検出結果である。ここでは検出結果をマスク画像で表している。人がその画像を見たとき真っ先に目が行く領域を検出できていることが分かる。

　SODの研究は1980年代以前からあるが、注目を集めるようになったのはやはり、2015年に深層学習（畳み込みニューラルネットワーク）が導入され大きく精度が向上してからのようである。2020年代に入ると、TransformerベースのSODが導入され、さらに精度が上がっているようだ。

　SODにおいては物体の大きさに関わらず「目立つ」物体を検出する必要がある。画像内の局所的な特徴量を重視すれば大きな物体を見逃してしまうし、逆に大域的な特徴量を重視すれば小さな物体を見逃してしまう。また、一枚の画像内に大小の「目立つ」物体が共存する場合もある。したがって精度よくSODを実行するには以下の特徴を全て捉える必要がある。

画像の局所的な特徴量（local feature）

画像の大域的な特徴量（global feature）

様々なスケールの特徴量（multi-scale feature）

これらを効率的に（学習時間を短く、計算リソースを少なく）取り入れつつ、精度を上げていくというのがSOD研究の大きな流れである。

Salient Object Detectionの応用

　上の画像を見ると、画像から背景を除去し前景だけを抽出しているようにも見える。この性質を利用したのがPythonの背景除去ライブラリrembgである（MITライセンス）。ここではその使い方を紹介する。rembgのベースになっているSODの論文はこれである。上の画像もこの論文から引用した。

環境構築

　Windows11のWSLにインストールしたUbuntu-24.04上で検証した。Pythonのバージョンは3.11.7とした。また、Pythonのパッケージマネジャー「poetry」の下で各種ライブラリをインストールした。必要なライブラリは以下の通りである。

$> poetry add llvmlite@latest
$> poetry add numba@latest
$> poetry add onnxruntime
$> poetry add rembg

$> poetry add llvmlite@latest

$> poetry add numba@latest

$> poetry add onnxruntime

$> poetry add rembg

コード

　コードは以下の通り。ソースコードはここにある。

import glob
import os

from PIL import Image
from rembg import remove

if __name__ == "__main__":

    # Set input paths and output directory path
    input_paths = glob.glob("/home/kumada/data/rembg/inputs/*")
    output_dir_path = "/home/kumada/data/rembg/outputs"

    # Create output directory if not exists
    if not os.path.exists(output_dir_path):
        os.makedirs(output_dir_path)

    for input_path in input_paths:
        print(f"> input_path: {input_path}")
        src_image = Image.open(input_path)

        # Remove background
        dst_image = remove(src_image)

        basename = os.path.basename(input_path)
        head, _ = os.path.splitext(basename)
        output_path = os.path.join(output_dir_path, f"{head}.png")
        print(f"> output_path: {output_path}")

        # Save image
        dst_image.save(output_path)  # type: ignore

        print("Done.")

import glob

import os

from PIL import Image

from rembg import remove

if __name__ == "__main__":

# Set input paths and output directory path

input_paths = glob.glob("/home/kumada/data/rembg/inputs/*")

output_dir_path = "/home/kumada/data/rembg/outputs"

# Create output directory if not exists

if not os.path.exists(output_dir_path):

os.makedirs(output_dir_path)

for input_path in input_paths:

print(f"> input_path: {input_path}")

src_image = Image.open(input_path)

# Remove background

dst_image = remove(src_image)

basename = os.path.basename(input_path)

head, _ = os.path.splitext(basename)

output_path = os.path.join(output_dir_path, f"{head}.png")

print(f"> output_path: {output_path}")

# Save image

dst_image.save(output_path) # type: ignore

print("Done.")

フォルダ内の複数画像を一括処理するプログラムである。

10行目：全ての画像パスを読み込む。

22行目：背景除去を行う。

30行目：結果画像を保存する。

特に難しいところはない。

結果

　以下に結果を示す。左側が元画像、右側が結果画像である。ここで用いた画像は全て無料の写真素材・AI画像素材「ぱくたそ」からダウンロードしたものである。

画像内に「目立つ」ものが複数ある場合は、抽出したい対象物を含む矩形を切り出しrembgを適用すればよいだろう。

まとめ

　今回は、SODの概略とその応用例を紹介した。例として背景除去ツールrembgを紹介したがAdobeソフトに載せられるような本格的な仕様にするには、細部の微修正などができる編集機能が必要であろう。rembgにはそこまでの機能は備わっていない。SODを適用でき、かつお金儲けのできる分野は他に何があるだろうか。

Kumada Seiya

仕事であろうとなかろうと勉強し続ける、その結果”中身”を知ったエンジニアになれる

オープンソースのLLM（ELYZA-japanese-Llama-2-7b-instruct）

自動翻訳ライブラリdeep-translator

Salient Object Detection

はじめに

Salient Object Detectionとは

Salient Object Detectionの応用

環境構築

コード

結果

まとめ

Kumada Seiya

最近の記事

AppleのSHARP

NotebookLMを用いたスライドの自動生成

日本語手書き文字のOCRの精度比較

Nano Bananaの描画能力

LangExtract

LLMとMCPの連携

Deep Metric Learning

AppleのSHARP

ガウス過程

テンソルネットワークの入り口

バイナリー・クロスエントロピー

Conditional Variational Autoencoder

Google Vision APIでOCR

アーカイブ

カテゴリー