NAMAZUのインストール


NAMAZUは、愛知大学の高橋 哲氏により開発された日本語全文検索システムです。

このNAMAZUはLinux版だけではなく、Windows、OS2版まであるという優れものです。

日本語の為のサーチエンジンを構築する場合にとても、優れたソフトウェアだと確信致しております。

hero-islandも、この優れたNAMAZUを利用させて戴きました。

NAMAZUは、http://www.namazu.org/から、namazu-2.0.17.tar.gzをダウンロード(0.98MB)しました。

http://kakasi.namazu.org/から、KAKASI-2.3.4をダウンロード(1MB)しました。

すべてのファイルは/usr/local/srcにダウンロードしてください、


◆手順は以下のとおりで行ないます。

@KAKASIのインストール

ANAMAZUのインストール

BNAMAZUの設定

CApacheの設定



◆KAKASIのインストール
ダウンロードをしたKAKASIの最新Versionは2.3.4でした。
# KAKASI -h                                KAKASIがインストールされているか調べます。
KAKASI: Comand not found.            KAKASIがインストールされていない状態です。

# cd /usr/local/src

/usr/local/src# tar zxvf KAKASI-2.3.4.tar.gz      KAKASI-2.3.4.tar.gzの解凍   

/usr/local/src# cd kakasi-2.3.4

/usr/local/src/kakasi-2.3.4# ./configure  

/usr/local/src/kakasi-2.3.4# make  

/usr/local/src/kakasi-2.3.4# su  

/usr/local/src/kakasi-2.3.4# make install  


ここでKAKASIのインストールは終了です。
KAKASIのインストール確認をします。
/usr/local/src/kakasi-2.3.4#  cd /root
 
# kakasi -h

KAKASIに関する情報が表示されます。メッセージの一番下に、以下のメッセージが表示されるとKAKASIのpatchは成功です。

-w:wakatigaki mode (added by H.Baba,sun Jul  7  16:58:40 JST 1996)


トップへ戻る

◆NAMAZUのインストール

ダウンロードをしたNAMAZUの最新Versionは2.0.17でした。

# cd /usr/local/src

/usr/local/src# tar zvxf namazu-2.0.17.tar.gz      namezu-2.0.17.tar.gzの解凍    

/usr/local/src# cd namazu-2.0.17/File-MMagic

/usr/local/src/namazu-2.0.17/File-MMagic# perl Makefile.PL

/usr/local/src/namazu-2.0.17/File-MMagic# make install

/usr/local/src/namazu-2.0.17/File-MMagic# cd ../

/usr/local/src/namazu-2.0.17# ./configure --prefix=/usr/local

/usr/local/src/namazu-2.0.17# make

/usr/local/src/namazu-2.0.17# su

/usr/local/src/namazu-2.0.17# make install

/usr/local/src/namazu-2.0.17/src cd /root             ROOTに戻って正常にインストールできたかテストをして見ました。

# mknmz /usr/local/namazu/doc                /usr/local/namazu/docにあるhtmlファイルでインデックスを作成します。

#namazu lynx .                                       検索結果が見れます。                                     

トップへ戻る

◆NAMAZUの設定

# su  

# mkdir /var/lib/Apache/htdocs/index      

# mknmz -o /var/lib/Apache/htdocs/index /var/lib/Apache/htdocs  /var/lib/Apache/var/index/の下にインデックスを置きました。    

# cd /root

# NAMAZU キーワード /var/lib/Apache/var/htdcos           で検索ができるかどうかを確認します。

トップへ戻る

◆Apacheの設定

  • Apacheの設定
    # cp /usr/local/libexec/namazu.cgi /var/lib/apache/cgi-bin/namazu.cgi  ←namazu.cgiを/var/lib/apache/share/cgi-bin/にコピーすることでnamazu.cgiができる                 
    

    Apacheの設定ファイルを編集する。

    httpd.confの編集
    #/var/lib/apache/bin/apachectl stop ←apacheのサービスを停止します。
    #vi /var/lib/apache/conf/httpd.conf ←viコマンドでhttpd.confを開きます。
    # This may also be "None", "All", or any combination of "Indexes",
    # "Includes", "FollowSymLinks", "ExecCGI", or "MultiViews".
    #
    # Note that "MultiViews" must be named *explicitly* --- "Options All"
    # doesn't give it to you.
    #
        Options Includes FollowSymLinks ExecCGI IncludesでSSIを許可、ExecCGIでCGIプログラムの実行許可
    
        # ScriptAlias: This controls which directories contain server scripts.
        # ScriptAliases are essentially the same as Aliases, except that
        # documents in the realname directory are treated as applications and
        # run by the server when requested rather than as documents sent to the client.
        # The same rules about trailing "/" apply to ScriptAlias directives as to
        # Alias.
        #
    ScriptAlias /cgi-bin/ "/var/lib/Apache/cgi-bin/" CGIスクリプトを入れる場所。
        #
        # "/var/lib/apache/cgi-bin" should be changed to whatever your ScriptAliased
        # CGI directory exists, if you have that configured.
        #
        
            AllowOverride None
            Options None
            Order allow,deny
            Allow from all
        
    
        # If you want to use server side includes, or CGI outside
        # ScriptAliased directories, uncomment the following lines.
        #
        # To use CGI scripts:
        #
        # CGIの利用を設定します。
        #
    AddHandler cgi-script .cgiCGIを開放します。
    
    #が付いている場合は、#を外してください、
    

    #/var/lib/apache/bin/apachectl start ←apacheのサービスを開始します。

    namazurcファイルの編集

    #cp /usr/local/etc/namazu/namazurc-sample  /var/lib/apache/cgi-bin/.nzmazurc    /usr/local/etc/namazu/namazurc-sampleを/var/lib/apache/cgi-binに.namazurcとしてコピーします。     
    
    #cd /var/lib/Apache/cgi-bin
    
    #vi .namazurc
    基本的には#を外してapacheと同期をとります。
    # This is a Namazu configuration file for namazu or namazu.cgi.
    #
    #  Originally, this file is named 'namazurc-sample'.  so you should
    #  copy this to 'namazurc' to make the file effective.
    #  
    #  Each item is must be separated by one or more SPACE or TAB characters. 
    #  You can use a double-quoted string for represanting a string which 
    #  contains SPACE or TAB characters like "foo bar baz".
    
    
    ##
    ## Index: Specify the default directory.
    ## 
    Index         /var/lib/apache/htdocs/
    
    
    ##
    ## Template: Set the template directory containing
    ## NMZ.{head,foot,body,tips,result} files.
    ##
    Template      /var/lib/apache/htdocs/
    
    
    ##
    ## Replace: Replace TARGET with REPLACEMENT in URIs in search
    ## results.  
    ##
    ## TARGET is specified by Ruby's perl-like regular expressions.  
    ## You can caputure sub-strings in TARGET by surrounding them 
    ## with `(' and `)'and use them later as backreferences by
    ## \1, \2, \3,... \9.
    ##  
    ## To use meta characters literally such as `*', `+', `?', `|', 
    ## `[', `]', `{', `}', `(', `)', escape them with `\'.
    ##  
    ## e.g.,
    ##  
    ##    Replace  /home/foo/public_html/   http://www.foobar.jp/~foo/
    ##    Replace  /home/(.*)/public_html/  http://www.foobar.jp/\1/
    ##    Replace  /[Cc]\|/foo/             http://www.foobar.jp/
    ##  
    ## If you do not want to do the processing on command line use, 
    ## run namazu with -U option.
    ##
    ## You can specify more than one Replace rules but the only 
    ## first-matched rule are applied. 
    ##
    Replace       /var/lib/apache/htdocs/  http://www.ドメイン名/
    
    
    ##
    ## Logging: Set OFF to turn off keyword logging to NMZ.slog. 
    ## Default is ON.
    ##
    Logging       ON
    
    
    ##
    ## Lang: Set the locale code such as `ja_JP.eucJP', `ja_JP.SJIS', 
    ## `de', etc.  This directive works only if the environment 
    ## variable LANG is not set because the directive is mainly 
    ## intended for CGI use.  On the shell, You can set 
    ## environemtnt variable LANG instead of using the directive.
    ## 
    ## If you set `de' to it, namazu.cgi use 
    ## NMZ.(head|foot|body|tips|results).de for displaying results 
    ## and use a proper message catalog for `de'.
    ##
    Lang          ja
    
    
    ##
    ## Scoring: Set the scoring method "tfidf" or "simple".
    ##
    Scoring       tfidf
    
    
    ##
    ## EmphasisTags: Set the pair of html elements which is used in
    ## keyword emphasizing for search results.
    ##
    EmphasisTags  ""   ""
    
    ##
    ## MaxHit: Set the maximum number of documents which can be
    ## handled in query operation.  If documents matching a
    ## query exceed the value, they will be ignored.
    ##
    MaxHit	10000
    
    ##
    ## MaxMatch: Set the maximum number of words which can be
    ## handled in regex/prefix/inside/suffix query. If documents
    ## matching a query exceed the value, they will be ignored.
    ##
    MaxMatch	1000
    
    ##
    ## ContentType: Set "Content-Type" header output. If you want to
    ## use non-HTML template files, set it suitably.
    ##
    #ContentType	"text/x-hdml"
    
    ##
    ## Suicide_Time: namazu.cgi stops the process in 60 seconds by
    ## default.
    ## (Only UNIX)
    ##
    #Suicide_Time	60
    
    ##
    ## Regex_Search: Set OFF to turn off regex_search.
    ## Default is ON.
    ##
    #Regex_Search	off
    

    mknmzrcファイルの編集

    #cp /usr/local/etc/namazu/mknmzrc-sample  /var/lib/apache/cgi-bin/.mknmzrc    /usr/local/etc/namazu/mknmzrc-sampleを/var/lib/apache/cgi-binに.mknmzrcとしてコピーします。     
    
    #cd /var/lib/Apache/cgi-bin
    
    #vi .mknmzrc
    ここも基本的には#を外すだけです。
    #
    # This is a Namazu configuration file for mknmz.
    #
    package conf;  # Don't remove this line!
    
    #===================================================================
    #
    # Administrator's email address
    #
     $ADDRESS = 'hirosima@ドメイン名';
    
    
    #===================================================================
    #
    # Regular Expression Patterns
    #
    
    #
    # This pattern specifies HTML suffixes.
    #
     $HTML_SUFFIX = "html?|[ps]html|html\\.[a-z]{2}";
    
    #
    # This pattern specifies file names which will be targeted.
    # NOTE: It can be specified by --allow=regex option.
    #       Do NOT use `$' or `^' anchors.
    #       Case-insensitive.
    #
     $ALLOW_FILE =	".*\\.(?:$HTML_SUFFIX)|.*\\.txt" . # HTML, plain text
     		"|.*\\.gz|.*\\.Z|.*\\.bz2" .       # Compressed files
     		"|.*\\.pdf|.*\\.ps" . 		   # PDF, PostScript
     		"|.*\\.tex|.*\\.dvi" .   	   # TeX, DVI
     		"|.*\\.rpm|.*\\.deb" .   	   # RPM, DEB
     		"|.*\\.doc|.*\\.xls|.*\\.pp[st]" . # Word, Excel, PowerPoint
     		"|.*\\.docx|.*\\.xlsx|.*\\.pp[st]x" . # MS-OfficeOpenXML Word, Excel, PowerPoint
     		"|.*\\.vs[dst]|.*\\.v[dst]x" .     # Visio
     		"|.*\\.j[sabf]w|.*\\.jtd" .        # Ichitaro 4, 5, 6, 7, 8
     		"|.*\\.sx[widc]" .                 # OpenOffice Writer,Calc,Impress,Draw
     		"|.*\\.od[tspg]" .                 # OpenOffice2.0
     		"|.*\\.rtf" .                      # Rich Text Format
     		"|.*\\.hdml|.*\\.mht" .		   # HDML MHTML
     		"|.*\\.mp3" .			   # MP3 
     		"|.*\\.gnumeric" .                 # Gnumeric
     		"|.*\\.kwd|.*\\.ksp" .             # KWord, KSpread
     		"|.*\\.kpr|.*\\.flw" .             # KPresenter, Kivio
     		"|.*\\.eml|\\d+|[-\\w]+\\.[1-9n]"; # Mail/News, man
    
    #
    # This pattern specifies file names which will NOT be targeted.
    # NOTE: It can be specified by --deny=regex option.
    #       Do NOT use `$' or `^' anchors.
    #       Case-insensitive.
    #
     $DENY_FILE = ".*\\.(gif|png|jpg|jpeg)|.*\\.tar\\.gz|core|.*\\.bak|.*~|\\..*|\x23.*";
    
    #
    # This pattern specifies DDN(DOS Device Name) which will NOT be targeted.
    # NOTE: Only for Windows.
    #       Do NOT use `$' or `^' anchors.
    #       Case-insensitive.
    #
     $DENY_DDN = "con|aux|nul|prn|lpt[1-9]|com[1-9]|clock\$|xmsxxxx0";
    
    #
    # This pattern specifies PATHNAMEs which will NOT be targeted.
    # NOTE: Usually specified by --exclude=regex option.
    #
     $EXCLUDE_PATH = undef;
    
    #
    # This pattern specifies file names which can be omitted 
    # in URI.  e.g., 'index.html|index.htm|Default.html'
    #
    # NOTE: This is similar to Apache's "DirectoryIndex" directive.
    #
     $DIRECTORY_INDEX = "";
    
    #
    # This pattern specifies Mail/News's fields in its header which 
    # should be searchable.  NOTE: case-insensitive
    #
     $REMAIN_HEADER = "From|Date|Message-ID";
    
    #
    # This pattern specifies fields which used for field-specified 
    # searching.  NOTE: case-insensitive
    # 
     $SEARCH_FIELD = "message-id|subject|from|date|uri|newsgroups|to|summary|size";
    
    #
    # This pattern specifies meta tags which used for field-specified 
    # searching.  NOTE: case-insensitive
    #
     $META_TAGS = "keywords|description";
    
    #
    # This pattern specifies aliases for NMZ.field.* files.
    # NOTE: Editing NOT recommended.
    #
     %FIELD_ALIASES = ('title' => 'subject', 'author' => 'from');
    
    #
    # This pattern specifies HTML elements which should be replaced with 
    # null string when removing them. Normally, the elements are replaced 
    # with a single space character.
    #
     $NON_SEPARATION_ELEMENTS = 'A|TT|CODE|SAMP|KBD|VAR|B|STRONG|I|EM|CITE|FONT|U|'.
                            'STRIKE|BIG|SMALL|DFN|ABBR|ACRONYM|Q|SUB|SUP|SPAN|BDO';
    
    #
    # This pattern specifies attribute of a HTML tag which should be 
    # searchable.
    #
    # $HTML_ATTRIBUTES = 'ALT|SUMMARY|TITLE';
    
    
    #===================================================================
    # 
    # Critical Numbers
    # 
    
    # 
    # The max size of files which can be loaded in memory at once.
    # If you have much memory, you can increase the value.
    # If you have less memory, you can decrease the value.
    #
     $ON_MEMORY_MAX   = 512000000; ←検索INDEX作成時に500MBのメモリを与えています。
    
    #
    # The max file size for indexing. Files larger than this 
    # will be ignored.
    # NOTE: This value is usually larger than TEXT_SIZE_MAX because 
    #       binary-formated files such as PDF, Word are larger.
    #
     $FILE_SIZE_MAX   = 2000000;
    
    #
    # The max text size for indexing. Files larger than this 
    # will be ignored.
    #
     $TEXT_SIZE_MAX   =  600000;
    
    #
    # The max length of a word. the word longer than this will be ignored.
    #
     $WORD_LENG_MAX   = 128;
    
    
    #
    # Weights for HTML elements which are used for term weightning.
    #
     %Weight = 
         (
          'html' => {
              'title'  => 16,
              'h1'     => 8,
              'h2'     => 7,
              'h3'     => 6,
              'h4'     => 5,
              'h5'     => 4,
              'h6'     => 3,
              'a'      => 4,
              'strong' => 2,
              'em'     => 2,
              'kbd'    => 2,
              'samp'   => 2,
              'var'    => 2,
              'code'   => 2,
              'cite'   => 2,
              'abbr'   => 2,
              'acronym'=> 2,
              'dfn'    => 2,
          },
          'metakey' => 32, # for 
          'headers' => 8,  # for Mail/News' headers
     );
    
    #
    # The max length of a HTML-tagged string which can be processed for
    # term weighting. 
    # NOTE: There are not a few people has a bad manner using 
    #        for changing a font size.
    #
     $INVALID_LENG = 128; 
    
    #
    # The max length of a field.
    # This MUST be smaller than libnamazu.h's BUFSIZE (usually 1024).
    #
     $MAX_FIELD_LENGTH = 200;
    
    
    #===================================================================
    #
    # Softwares for handling a Japanese text
    #
    
    #
    # Network Kanji Filter nkf v1.71 or later
    #
     $NKF = "/usr/local/bin/nkf"; 
    
    #
    # KAKASI 2.x or later
    # Text::Kakasi 1.05 or later
    #
     $KAKASI = "/usr/local/bin/kakasi -ieuc -oeuc -w";
    
    #
    # ChaSen 2.02 or later (simple wakatigaki)
    # Text::ChaSen 1.03
    #
    # $CHASEN = "no";
    
    #
    # ChaSen 2.02 or later (with noun words extraction)
    #
    # $CHASEN_NOUN = "no";
    
    #
    # MeCab
    #
    # $MECAB = "no";
    
    #
    # Default Japanese processer: KAKASI or ChaSen.
    #
     $WAKATI  = $KAKASI;
    
    
    #===================================================================
    #
    # Directories
    #
    # $LIBDIR = "@PERLLIBDIR@";
    # $FILTERDIR = "@FILTERDIR@";
    # $TEMPLATEDIR = "@TEMPLATEDIR@";
    
    # 1;
    
    

    一応これで検索フォームを作成すれば終わりです。

    トップへ戻る




    前ページへ戻る



    ご質問・お問い合わせ - 免責事項
    Copyright (C) 1998 hero-island. All Rights Reserved.