Главная
Настройка Windows
Написание HTTP-запросов с помощью Curl. Оттачиваем мастерство работы с cURL

Написание HTTP-запросов с помощью Curl. Оттачиваем мастерство работы с cURL

14.07.2019

Эта статья предполагает, что вам известны основы построения сетей и язык HTML.

Возможность написания скриптов является существенной при построении хорошей компьютерной системы. Расширяемость возможностей Unix-систем при помощи shell-скриптов и различных программ, выполняющих автоматизированные команды - вот одна из причин того, почему они имеют такой успех.

Увеличивающееся число приложений, которые переходят на веб, привело к тому, что тема HTTP-скриптов становится все более востребованной. Важными задачами в этой области являются автоматическое извлечение информации из интернета, отсылание или загрузка данных на web-сервера и т.п.

Curl - инструмент командной строки, который позволяет проделывать URL-манипуляции и передачи различного рода. Эта статья сфокусирована на создании простых HTTP-запросов. Предполагается, что вы уже знаете где набирать

# curl --help

# curl --manual

для получения информации о curl.

Curl не является инструментом, который будет делать все за вас. Он создает запросы, принимает данные и отсылает данные. Возможно, вам потребуется какой-то "клей" для объединения всего, возможно какой-то скриптовый язык (например bash) или несколько ручных вызовов.

1. Протокол HTTP

HTTP - это протокол, используемый при приеме данных от web-серверов. Это очень простой протокол, который построен на TCP/IP. Протокол также позволяет отправлять информацию на сервер от клиента, используя несколько методов, как будет показано далее.

HTTP - это строки ASCII-текста, отсылаемые от клиента к серверу для запроса какого-либо действия. При получении запроса сервер отвечает клиенту несколькими служебными текстовыми строками, а затем и собственно контентом.

Используя ключ curl -v, вы можете увидеть, какие команды curl отсылает серверу, а также другой информационный текст. Ключ -v - пожалуй, единственная возможность отладить или даже понять особенности взаимодействия curl и веб-сервера.

2. URL

Формат URL (Uniform Resource Locator - универсальный адрес ресурса) задает адрес определенного ресурса в Интернете. Вам это наверняка известно, примеры URL: http://curl.haxx.se или https://yourbank.com.

3. Получить (GET) страницу

Простейший и самый обычный HTTP-запрос - получить содержимое URL. URL может ссылаться на web-страницу, картинку или файл. Клиент отсылает GET-запрос на сервер и получает запрашиваемый документ. Если выполнить команду

# curl http://curl.haxx.se

вы получите web-страницу, выведенную в ваше терминальное окно. Полный HTML-документ, который содержится по этому адресу URL.

Все HTTP-ответы содержат набор заголовков, которые обычно скрыты. Чтобы их увидеть вместе с самим документом, используйте ключ curl -i. Вы можете также запросить только заголовки при помощи ключа -I (который заставит curl сделать HEAD-запрос).

4. Формы

Формы - основной способ представления web-сайта как HTML-страницы с полями, в которые пользователь вводит данные, и затем нажимает на кнопку "OK" или "Отправить", после чего данные отсылаются на сервер. Затем сервер использует принятые данные и решает, как действовать дальше: искать информацию в базе данных, показать введенный адрес на карте, добавить сообщение об ошибке или использовать информацию для аутентификации пользователя. Разумеется, на стороне сервера имеется какая-то программа, которая принимает ваши данные.

4.1 GET

GET-форма использует метод GET, например следующим образом:

Если вы откроете этот код в вашем браузере, вы увидите форму с текстовым полем и кнопку с надписью "OK". Если вы введете "1905" и нажмете OK, браузер создаст новый URL, по которому и проследует. URL будет представляться строкой, состоящей из пути предыдущего URL и строки, подобной "junk.cgi?birthyear=1905&press=OK".

Например, если форма располагалась по адресу "www.hotmail.com/when/birth.html", то при нажатии на кнопку OK вы попадете на URL "www.hotmail.com/when/junk.cgi?birthyear=1905&press=OK".

Большинство поисковых систем работают таким образом.

Чтобы curl сформировал GET-запрос, просто введите то, что ожидалось от формы:

# curl "www.hotmail.com/when/junk.cgi?birthyear=1905&press=OK"

4.2 POST

Метод GET приводит к тому, что вся введенная информация отображается в адресной строке вашего браузера. Может быть это хорошо, когда вам нужно добавить страницу в закладки, но это очевидный недостаток, когда вы вводите в поля формы секретную информацию, либо когда объем информации, вводимый в поля, слишком велик (что приводит к нечитаемому URL).

Протокол HTTP предоставляет метод POST. С помощью него клиент отправляет данные отдельно от URL и поэтому вы не увидете их в адресной строке.

Форма, генерирующая POST-запрос, похожа на предыдущую:

Curl может сформировать POST-запрос с теми же данными следующим образом:

# curl -d "birthyear=1905&press=%20OK%20" www.hotmail.com/when/junk.cgi

Этот POST-запрос использует "Content-Type application/x-www-form-urlencoded", это самый широко используемый способ.

Данные, которые вы отправляете к серверу, должны быть правильно закодированы , curl не будет делать это за вас. К примеру, если вы хотите, чтобы данные содержали пробел, вам нужно заменить этот пробел на %20 и т.п. Недостаток внимания к этому вопросу - частая ошибка, из-за чего данные передаются не так, как надо.

4.3 Загрузка файлов с помощью POST (File Upload POST)

В далеком 1995 был определен дополнительный способ передавать данные по HTTP. Он задокументирован в RFC 1867, поэтому этот способ иногда называют RFC1867-posting.

Этот метод в основном разработан для лучшей поддержки загрузки файлов. Форма, которая позволяет пользователю загрузить файл, выглядит на HTML примерно следующим образом:

Заметьте, что тип содержимого Content-Type установлен в multipart/form-data.

Чтобы отослать данные в такую форму с помощью curl, введите команду:

# curl -F upload=@localfilename -F press=OK

4.4 Скрытые поля

Обычный способ для передачи информации о состоянии в HTML-приложениях - использование скрытых полей в формах. Скрытые поля не заполняются, они невидимы для пользователя и передаются так же, как и обычные поля.

Простой пример формы с одним видимым полем, одним скрытым и кнопкой ОК:

Чтобы отправить POST-запрос с помощью curl, вам не нужно думать о том, скрытое поле или нет. Для curl они все одинаковы:

# curl -d "birthyear=1905&press=OK&person=daniel"

4.5 Узнать, как выглядит POST-запрос

Когда вы хотите заполнить форму и отослать данные на сервер с помощью curl, вы наверняка хотите, чтобы POST-запрос выглядел точно также, как и выполненный с помощью браузера.

Простой способ увидеть свой POST-запрос, это сохранить HTML-страницу с формой на диск, изменить метод на GET, и нажать кнопку "Отправить" (вы можете также изменить URL, которому будет передаваться данные).

Вы увидите, что данные присоединились к URL, отделенные символами "?", как и предполагается при использовании GET-форм.

5. PUT

Пожалуй, лучший способ загружать данные на HTTP-сервер, это использовать PUT. Опять же, это требует программы (скрипта) на серверной части, которая знает, что делать и как принимать поток HTTP PUT.

Отослать файл на сервер при помощи curl:

# curl -T uploadfile www.uploadhttp.com/receive.cgi

6. Аутентификация

Аутентификация - передача серверу имени пользователя и пароля, после этого он проверяет, имеете ли вы право выполнить требуемый запрос. Аутентификация по методу Basic (которым curl пользуется по умолчанию) основана на открытом тексте , что означает, что имя пользователя и пароль не будут зашифрованы, а лишь слегка "затуманены" по алгоритму Base64, что оставляет возможность узнать эту информацию злоумышленникам на пути между вами и HTTP-сервером.

Указание curl использовать имя пользователя и пароль:

# curl -u name:password www.secrets.com

Сайт может требовать использования другого метода аутентификации (посмотрите, что пишет сервер в заголовках), в этих случаях можно использовать ключи --ntlm, --digest, --negotiate или даже --anyauth. Иногда доступ к внешним HTTP-серверам происходит через прокси, так часто делают в компаниях и фирмах. HTTP-прокси может требовать свои логин и пароль для доступа к Интернету. Соответствующий ключ curl:

# curl -U proxyuser:proxypassword curl.haxx.se

Если прокси требует аутентификации по методу NTLM, укажите --proxy-ntlm, если метод Digest, то --proxy-digest.

Если вы не укажете пароль в ключах -u и -U, то curl спросит его у вас в интерактивном режиме.

Заметьте, что когда curl работает, строка запуска (а вместе с этим и ключи, и пароли) могут быть видны другим пользователям вашей системы в списке задач. Есть способы предотвратить это. Об этом ниже.

7. Referer

HTTP-запрос может включать поле "referer", которое указывает, с какого URL пользователь пришел на данный ресурс. Некоторые программы/скрипты проверяют поле "referer" и не выполняют запрос, если пользователь пришел с неизвестной страницы. Хотя это и глупый способ проверки, тем не менее многие скрипты используют его. С помощью curl вы можете вписать что угодно в поле "referer" и таким образом заставлять выполнять то, что вам нужно.

Это делается следующим образом:

# curl -e http://curl.haxx.se daniel.haxx.se

8. User Agent

Все HTTP-запросы поддерживают поле "User-Agent", в котором указывается клиентское приложение пользователя. Многие web-приложения используют эту информацию, чтобы тем или иным способом отобразить страницу. Web-программисты создают несколько версий страницы для пользователей разных браузеров в целях улучшения внешнего вида, использования различных скриптов javascript, vbscript и т.д.

Иногда вы можете обнаружить, что curl возвращает страницу не такой, какой вы ее видели в своем браузере. В этом случае как раз уместно использовать поле "User Agent", чтобы в очередной раз обмануть сервер.

Замаскировать curl под Internet Explorer на машине с Windows 2000:

# curl -A "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)"

Почему бы не стать Netscape 4.73 на Linux-машине (PIII):

# curl -A "Mozilla/4.73 (X11; U; Linux 2.2.15 i686)"

9. Перенаправления (redirects)

Отвечая на ваш запрос, сервер вместо самой страницы может вернуть указание, куда браузер должен пойти дальше, чтобы попасть на нужную страницу. Заголовок, который указывает браузеру такое перенаправление - это "Location:".

По умолчанию curl не идет по адресу, указанному в "Location:", а просто показывает страницу как обычно. Но можно его направить следующим образом:

# curl -L www.sitethatredirects.com

Если вы используете curl для POST-запросов на сайт, который сразу же перенаправляет на другую страницу, вы можете смело использовать связку -L и -d/-F. Curl сформирует POST-запрос для первой страницы, а затем GET-запрос для последующей.

10. Cookies

С помощью cookies веб-браузеры контролируют состояние на стороне клиента. Cookie - это имя с присоединенным содержимым. Сервер при помощи отправки cookies сообщает клиенту путь и имя хоста, по которому в следующий раз должны быть отправлены cookies, сообщает время жизни cookies и некоторые другие параметры.

Когда клиент соединяется с сервером по адресу, указанному в принятом cookie, клиент посылает этот cookie к серверу (если время жизни не истекло).

Многие приложения и сервера используют этот метод, чтобы объединить нескольких запросов в одну логическую сессию. Чтобы curl также мог выполнять такую функцию, мы должны уметь сохранять и отправлять cookies, как и делают браузеры.

Простейший способ отправить cookie к серверу при получении страницы с помощью curl, это добавить соответствующий ключ в командной строке:

# curl -b "name=Daniel" www.cookiesite.com

Cookies отправляются как обычные HTTP-заголовки. Это позволяет curl сохранять cookies, сохраняя заголовки. Сохранение cookies с помощью curl выполняется командой:

# curl -D headers_and_cookies www.cookiesite.com

(кстати, для сохранения cookies лучше использовать ключ -c, об этом ниже).

У curl имеется полнофункциональный обработчик cookies, который полезен, когда вы хотите соединиться в очередной раз к серверу и использовать cookies, сохраненные в прошлый раз (либо подработанные вручную). Для использования cookies, сохраненных в файле, вызовите curl так:

# curl -b stored_cookies_in_file www.cookiesite.com

"Cookie-движок" curl включается, когда вы указываете ключ -b. Если вы хотите, чтобы curl лишь воспринимал cookies, используйте -b с указанием файла, которого не существует. Например, если вы хотите, чтобы curl принял cookies со страницы, а затем пошел по перенаправлению (возможно, отдав принятый только что cookie), то можно вызывать curl так:

# curl -b nada -L www.cookiesite.com

Curl умеет читать и писать cookie-файлы, имеющие формат Netscape и Mozilla. Это удобный способ обмениваться cookies между браузерами и автоматическими скриптами. Ключ -b автоматически определяет, является ли данный файл cookie-файлом указанных браузеров и обрабатывает его соответствующим образом, а используя ключ -c/--cookie-jar, вы можете заставить curl записать новый cookie-файл по завершении операции:

# curl -b cookies.txt -c newcookies.txt www.cookiesite.com

11. HTTPS

Есть несколько способов обезопасить ваши HTTP-передачи. Наиболее известным протоколом, решающим эту задачу, является HTTPS, или HTTP over SSL. SSL зашифровывает все посылаемые и принимаемые по сети данные, что увеличивает вероятность того, что ваша информация останется в тайне.

Curl поддерживает запросы к HTTPS-серверам благодаря свободно распространяемой библиотеке OpenSSL. Запросы происходят обычным способом:

# curl https://that.secure.server.com

11.1 Сертификаты

В мире HTTPS для аутентификации в дополнение к имени пользовавателя и паролю вы используете сертификаты. Curl поддерживает сертификаты на стороне клиента. Все сертификаты заперты ключевой фразой, которую вам нужно ввести прежде чем curl может начать с ними работу. Ключевая фраза может быть указана либо в командной строке, либо введена в интерактивном режиме. Сертификаты в curl используются следующим образом:

# curl -E mycert.pem https://that.secure.server.com

Curl также проверяет сервер на подлинность, сверяя сертификат сервера с локально хранящимся. Обнаружившееся несоответствие приведет к тому, что curl откажется соединяться. Для игнорирования проверки на подлинность используйте ключ -k.

Более подробная информация о сертификатах может быть найдена на странице http://curl.haxx.se/docs/sslcerts.html .

12. Произвольные заголовки запроса

Возможно, вам понадобится изменять или добавлять элементы отдельных запросов curl.

К примеру, вы можете изменить запрос POST на PROPFIND и отправить данные как "Content-Type: text/xml" (вместо обычного Content-Type):

# curl -d "" -H "Content-Type: text/xml" -X PROPFIND url.com

Вы можете удалить какой-нибудь заголовок, указав его без содержимого. Например, вы можете удалить заголовок "Host:", тем самым сделав запрос "пустым":

# curl -H "Host:" http://mysite.com

Также вы можете добавлять заголовки. Возможно, вашему серверу потребуется заголовок "Destination:":

# curl -H "Destination: http://moo.com/nowhere" http://url.com

13. Отладка

Часто бывает так, что сайт реагирует на запросы curl не так, как на запросы браузера. В этом случае нужно максимально уподобить curl браузеру:

Используйте ключ --trace-ascii для сохранения подробного отчета запросов, чтобы затем подробно изучить их и разобраться в проблеме.

Убедитесь, что вы проверяете на cookies и используете их при необходимости (ключ чтения -b и сохранения -c)

Укажите в поле "user-agent" один из последних популярных браузеров

Заполните поле "referer" как это делает браузер

Если вы используете POST-запросы, убедитесь, что все поля передаются в том же порядке, что и браузер (см. выше, пункт 4.5)

Хороший помощник в этом нелегком деле - плагин для Mozilla/Firefox LiveHTTPHeader , который позволяет просматривать все заголовки, которые отправляет и принимает этот браузер (даже при использовании HTTPS).

Более низкоуровневый подход - захват HTTP-траффика в сети с помощью программ, таких как ethereal или tcpdump, с последующим анализом, какие заголовки были получены и отправлены браузером (HTTPS делает этот подход неэффективным).

14. Ссылки

RFC 2616 обязательно для чтения всем, кто хочет понять протокол HTTP.

RFC 2396 объясняет синтаксис URL.

RFC 2109 определяет работу cookies.

RFC 1867 определяет формат File Upload Post.

http://openssl.planetmirror.com - домашняя страница проекта OpenSSL

http://curl.haxx.se - домашняя страница проекта cURL

cURL — инструмент командной строки для получения или отправки данных с использованием синтаксиса URL.

Если вы работаете в службе поддержки, то должны уметь использовать команды cURL для устранения неполадок веб-приложений. cURL — кроссплатформенная утилита для Windows, MAC и UNIX.
Ниже приведены некоторые часто используемые примеры синтаксиса.

1. Проверка возможности подключения к URL-адресу

Если вы работаете в UNIX-системе и пытаетесь подключиться к внешнему URL-адресу, то сначала проверьте наличие доступа к ресурсу через curl . Для этого используйте следующую команду:

# curl yoururl.com

2. Сохранение вывод URL / URI в файл

# curl yoururl.com > yoururl.html

Например:

# curl 74.125.68.100 >/tmp/google.html

Приведенный выше пример сохранит все содержимое с хоста 74.125.68.100 в файл /tmp/google.html .

3. Показать заголовок запроса и ответа

Если хотите удостовериться, что получаете ожидаемый заголовок запроса и ответа, используйте следующую команду:

# curl -v yoururl.com

например:

# curl -v 74.125.68.100 * About to connect() to 74.125.68.100 port 80 (#0) * Trying 74.125.68.100... * Connected to 74.125.68.100 (74.125.68.100) port 80 (#0) > GET / HTTP/1.1 > User-Agent: curl/7.29.0 >Host: 74.125.68.100 >Accept: */* >< HTTP/1.1 200 OK

4. Загрузить с предельной скоростью

Если нужно узнать, сколько времени требуется для загрузки с определенной скоростью, то используйте следующую команду:

# curl –-limit-rate 2000B

например:

# curl –-limit-rate 2000B 74.125.68.100

5. Использование прокси для подключения

Если необходимо проверить, можно ли использовать прокси-сервер, примените следующий синтаксис:

# curl --proxyyourproxy:port http://yoururl.com

6. Проверка URL-адресас введением заголовка

Для устранения конкретной проблемы можно использовать Curl , чтобы вставить в header свои данные. Рассмотрим следующий пример запроса с Content-Type:

# curl --header "Content-Type: application/json" http://yoururl.com

Мы просим curl передать Content-Type в качестве application / json в заголовок запроса.

7. Добавить дополнительный заголовок

Вы можете добавить заголовок к запросу с помощью синтаксиса — header .

# curl –-header “X-CustomHeader: GeekFlare” http://yoururl.com

например:

# curl -v --header "X-CustomHeader: GeekFlare" 74.125.68 * About to connect() to 74.125.68.100 port 80 (#0) * Trying 74.125.68.100... * Connected to 74.125.68.100 (74.125.68.100) port 80 (#0) > GET / HTTP/1.1 > User-Agent: curl/7.29.0 >Host: 74.125.68.100 >Accept: */* > X-CustomHeader: GeekFlare >< HTTP/1.1 200 OK

8. Открыть только заголовок ответа

Если вы хотите быстро проверить заголовок ответа, то для этого можно использовать следующий синтаксис.

# curl --head http://yoururl.com

# curl -I 74.125.68.100 HTTP/1.1 200 OK Date: Sun, 18 Jan 2015 08:31:22 GMT Expires: -1 Cache-Control: private, max-age=0 Content-Type: text/html; charset=ISO-8859-1 Set-Cookie: NID=67=SpnXKTDUhw7QGakIeLxmDSF; expires=Mon, 20-Jul-2015 08:31:22 GMT; path=/; domain=.; HttpOnly P3P: CP="This is not a P3P policy! See http://www.google.com/support/accounts/bin/answer.py?hl=en&answer=151657 for moreinfo." Server: gws X-XSS-Protection: 1; mode=block X-Frame-Options: SAMEORIGIN Alternate-Protocol: 80:quic,p=0.02 Transfer-Encoding: chunked Accept-Ranges: none Vary: Accept-Encoding #

9. Подключить HTTPS / SSLURL-адрес и игнорировать любые ошибки SSL -сертификата

Если необходимо получить доступ к https URL-адресу, который выдает ошибку сертификата из-за несоответствия имени хоста, можно использовать следующий синтаксис.

curl --insecure https://yoururl.com

10. Подключиться с использованием определенного протокола (SSL / TLS)

Чтобы подключиться к URL- адресу только по протоколу SSL V2 / V3 или TLS ,используйте следующий синтаксис.

Для подключения с использованием SSLV2:

# curl --sslv2 https://yoururl.com

Для подключения с использованием SSLV3:

# curl --sslv3 https://yoururl.com

Для подключения через TLS:

# curl --tlsv1 https://yoururl.com

11. Загрузить файл с FTP-сервера

С помощью cURL можно загрузить файл с ftp-сервера , указав имя пользователя и пароль.

# curl -u user:password -O ftp://ftpurl/style.css

Всегда можно использовать «-v» с любым синтаксисом для вывода в подробном режиме.

Использования cURL онлайн

Да, это возможно. Вы можете выполнить cURL удаленно с помощью следующих инструментов.
Online CURL — компактный инструмент для извлечения URL-адреса онлайн и добавления следующих параметров.

Connect-timeout --cookie --data --header --head --location --max-time --proxy --request --user --url --user-agent

Пример вывода:

cURL command line builder –позволяет создать команду cURL, с помощью которой можно ввести информацию в пользовательский интерфейс.

cURL — полезная утилита для устранения проблем с подключением в режиме реального времени.

Автор: Obaro Ogbo
Дата публикации: 29 апреля 2015 года
Перевод: А. Кривошей
Дата перевода: июль 2015 г.

curl - это кроссплатформенная утилита командной строки для получения и отправки файлов, использующая синтаксис URL. Название является рекурсивным акронимом для Curl URL Request Library, и это очень мощная программа, поддерживающая большое количество сетевых протоколов, включая HTTP, HTTPS, FTP, FTPS, SCP, SFTP, TFTP, LDAP, LDAPS, DICT, TELNET, FILE, IMAP, POP3, SMTP и RTSP.

curl поддерживает огромное количество полезных функций, в том числе аутентификацию пользователей, прокси-серверы, FTP, HTTP post, куки, возобновление передачи файлов, SSL-соединения и многое другое. В этой статье мы рассмотрим основные возможности curl для тех, кто впервые сталкивается с этой программой или мало знает о ней.

Установка

Для установки curl в системе Debian/Ubuntu используйте следующую команду:

$ sudo apt-get install curl

Синтаксис

curl ждет аргумент в виде url, и будет пытаться скачать любой файл, доступный по этому адресу.

$ curl http://www.maketecheasier.com

По умолчанию содержимое передаваемого файла отображается в командной строке. Если задан выходной файл, программа будет показывать индикатор прогресса с отображением количества переданных данных, скорость передачи, предполагаемое оставшееся время и потраченное время. Для сохранения скачиваемого файла под заданным именем используется опция -o:

$ curl -o mte-index.html http://www.maketecheasier.com

Для сохранения файла с тем же именем, что и на сервере, используется опция -O:

$ curl -O ftp://ftp.kernel.org/pub/linux/kernel/v4.x/linux-4.0.tar.xz

Получение файла с ftp-сервера, где требуется авторизация:

$ curl -O ftp://username:[email protected]/pub/linux/kernel/v4.x/linux-4.0.tar.xz

Для того, чтобы задать несколько URL или частей URL, поместите эти части в фигурные скобки:

$ curl -O http://www.maketecheasier.com/author/{obaro,ivana,vamsi}

Вы также можете задать последовательный диапазон с помощью квадратных скобок:

$ curl -O ftp://ftp.numericals.com/file.txt $ curl -O ftp://ftp.letters.com/file.txt $ curl -O http://any.org/archive/vol/part{a,b,c}.html

$ curl -o "file_#1.txt" http://{one,two}.site.com $ curl -o "output_#1_#2" http://{site,host}.host.com

Вы можете задать строку User-Agent для идентификации на серверах, для соединений http используйте флаг -A:

$ curl -A "Mozilla/5.0 (Windows NT 6.3; rv:36.0) Gecko/20100101 Firefox/36.0"-o mte-index.html http://www.maketecheasier.com

Для отправки данных с использованием HTTP POST предназначена опция -d:

$ curl -d "username=obaro" -d "password=password" http://maketecheasier.com

$ curl -d "username=obaro" -d "password=password" -c mte-cookies http://maketecheasier.com

Для использования этих сохраненных куки предназначена опция -b:

$ curl -b mte-cookies -d "hc_comment=This is a comment&submit=true" http://www.maketecheasier.com/monitor-hard-disk-health-linux/

Чтобы закачать файл на сайт, воспользуйтесь опцией -T. Для серверов http(s) вместо этого используется команда PUT:

$ curl -T "file1.jpg" http://www.uploadmania.com/upload

Чтобы в принудительном порядке задействовать соединение SSL/TLS, используется флаг --ssl-reqd. Также вы можете активировать SSL/TLS одной командой для нескольких файлов:

$ curl --ssl-reqd -T "file.jpg" ftp://username:[email protected]/upload

С помощью опции -u вы можете задать данные для авторизации:

$ curl -u username:password --ssl-reqd -T "file.jpg" ftp://ftp.uploadmania.com/upload

curl - очень простая, надежная и функциональная программа. Она имеет огромное количество опций и поддерживает множество протоколов, мы рассмотрели лишь малую долю ее возможностей. Если вы хотите изучить ее в совершенстве, читайте man-страницу.

curl (1)

>> curl (1) (тБЪОЩЕ man: лПНБОДЩ Й РТЙЛМБДОЩЕ РТПЗТБННЩ РПМШЪПЧБФЕМШУЛПЗП ХТПЧОС)

лМАЮ curl ПВОБТХЦЕО Ч ВБЪЕ ЛМАЮЕЧЩИ УМПЧ.

NAME

curl - transfer a URL

SYNOPSIS

curl

DESCRIPTION

curl is a tool to transfer data from or to a server, using one of the supported protocols (HTTP, HTTPS, FTP, FTPS, TFTP, DICT, TELNET, LDAP or FILE). The command is designed to work without user interaction.

curl offers a busload of useful tricks like proxy support, user authentication, ftp upload, HTTP post, SSL (https:) connections, cookies, file transfer resume and more. As you will see below, the amount of features will make your head spin!

curl is powered by libcurl for all transfer-related features. See (3) for details.

URL

The URL syntax is protocol dependent. You"ll find a detailed description in RFC 3986.

You can specify multiple URLs or parts of URLs by writing part sets within braces as in:

or you can get sequences of alphanumeric series by using as in:

No nesting of the sequences is supported at the moment, but you can use several ones next to each other:

You can specify any amount of URLs on the command line. They will be fetched in a sequential manner in the specified order.

Since curl 7.15.1 you can also specify step counter for the ranges, so that you can get every Nth number or letter:

If you specify URL without protocol:// prefix, curl will attempt to guess what protocol you might want. It will then default to HTTP but try other protocols based on often-used host name prefixes. For example, for host names starting with "ftp." curl will assume you want to speak FTP.

PROGRESS METER

curl normally displays a progress meter during operations, indicating amount of transfered data, transfer speeds and estimated time left etc.

However, since curl displays data to the terminal by default, if you invoke curl to do an operation and it is about to write data to the terminal, it disables the progress meter as otherwise it would mess up the output mixing progress meter and response data.

If you want a progress meter for HTTP POST or PUT requests, you need to redirect the response output to a file, using shell redirect (>), -o or similar.

It is not the same case for FTP upload as that operation is not spitting out any response data to the terminal.

If you prefer a progress "bar" instead of the regular meter, -# is your friend.

OPTIONS

-a/--append (FTP) When used in an FTP upload, this will tell curl to append to the target file instead of overwriting it. If the file doesn"t exist, it will be created.

If this option is used twice, the second one will disable append mode again. -A/--user-agent (HTTP) Specify the User-Agent string to send to the HTTP server. Some badly done CGIs fail if its not set to "Mozilla/4.0". To encode blanks in the string, surround the string with single quote marks. This can also be set with the -H/--header option of course.

If this option is set more than once, the last one will be the one that"s used. --anyauth (HTTP) Tells curl to figure out authentication method by itself, and use the most secure one the remote site claims it supports. This is done by first doing a request and checking the response-headers, thus inducing an extra network round-trip. This is used instead of setting a specific authentication method, which you can do with --basic , --digest , --ntlm , and --negotiate .

Note that using --anyauth is not recommended if you do uploads from stdin, since it may require data to be sent twice and then the client must be able to rewind. If the need should arise when uploading from stdin, the upload operation will fail.

If this option is used several times, the following occurrences make no difference. -b/--cookie (HTTP) Pass the data to the HTTP server as a cookie. It is supposedly the data previously received from the server in a "Set-Cookie:" line. The data should be in the format "NAME1=VALUE1; NAME2=VALUE2".

If no "=" letter is used in the line, it is treated as a filename to use to read previously stored cookie lines from, which should be used in this session if they match. Using this method also activates the "cookie parser" which will make curl record incoming cookies too, which may be handy if you"re using this in combination with the -L/--location option. The file format of the file to read cookies from should be plain HTTP headers or the Netscape/Mozilla cookie file format.

NOTE that the file specified with -b/--cookie is only used as input. No cookies will be stored in the file. To store cookies, use the -c/--cookie-jar option or you could even save the HTTP headers to a file using -D/--dump-header !

If this option is set more than once, the last one will be the one that"s used. -B/--use-ascii Enable ASCII transfer when using FTP or LDAP. For FTP, this can also be enforced by using an URL that ends with ";type=A". This option causes data sent to stdout to be in text mode for win32 systems.

If this option is used twice, the second one will disable ASCII usage. --basic (HTTP) Tells curl to use HTTP Basic authentication. This is the default and this option is usually pointless, unless you use it to override a previously set option that sets a different authentication method (such as --ntlm , --digest and --negotiate ).

If this option is used several times, the following occurrences make no difference. --ciphers (SSL) Specifies which ciphers to use in the connection. The list of ciphers must be using valid ciphers. Read up on SSL cipher list details on this URL: http://www.openssl.org/docs/apps/ciphers.html

If this option is used several times, the last one will override the others. --compressed (HTTP) Request a compressed response using one of the algorithms libcurl supports, and return the uncompressed document. If this option is used and the server sends an unsupported encoding, Curl will report an error.

If this option is used several times, each occurrence will toggle it on/off. --connect-timeout Maximum time in seconds that you allow the connection to the server to take. This only limits the connection phase, once curl has connected this option is of no more use. See also the -m/--max-time option.

If this option is used several times, the last one will be used. -c/--cookie-jar Specify to which file you want curl to write all cookies after a completed operation. Curl writes all cookies previously read from a specified file as well as all cookies received from remote server(s). If no cookies are known, no file will be written. The file will be written using the Netscape cookie file format. If you set the file name to a single dash, "-", the cookies will be written to stdout.

NOTE If the cookie jar can"t be created or written to, the whole curl operation won"t fail or even report an error clearly. Using -v will get a warning displayed, but that is the only visible feedback you get about this possibly lethal situation.

If this option is used several times, the last specified file name will be used. -C/--continue-at Continue/Resume a previous file transfer at the given offset. The given offset is the exact number of bytes that will be skipped counted from the beginning of the source file before it is transferred to the destination. If used with uploads, the ftp server command SIZE will not be used by curl.

Use "-C -" to tell curl to automatically find out where/how to resume the transfer. It then uses the given output/input files to figure that out.

If this option is used several times, the last one will be used. --create-dirs When used in conjunction with the -o option, curl will create the necessary local directory hierarchy as needed. This option creates the dirs mentioned with the -o option, nothing else. If the -o file name uses no dir or if the dirs it mentions already exist, no dir will be created.

To create remote directories when using FTP, try --ftp-create-dirs . --crlf (FTP) Convert LF to CRLF in upload. Useful for MVS (OS/390).

If this option is used several times, the following occurrences make no difference. -d/--data (HTTP) Sends the specified data in a POST request to the HTTP server, in a way that can emulate as if a user has filled in a HTML form and pressed the submit button. Note that the data is sent exactly as specified with no extra processing (with all newlines cut off). The data is expected to be "url-encoded". This will cause curl to pass the data to the server using the content-type application/x-www-form-urlencoded. Compare to -F/--form . If this option is used more than once on the same command line, the data pieces specified will be merged together with a separating &-letter. Thus, using "-d name=daniel -d skill=lousy" would generate a post chunk that looks like "name=daniel&skill=lousy".

If you start the data with the letter @, the rest should be a file name to read the data from, or - if you want curl to read the data from stdin. The contents of the file must already be url-encoded. Multiple files can also be specified. Posting data from a file named "foobar" would thus be done with --data @foobar".

To post data purely binary, you should instead use the --data-binary option.

-d/--data is the same as --data-ascii .

If this option is used several times, the ones following the first will append data. --data-ascii (HTTP) This is an alias for the -d/--data option.

If this option is used several times, the ones following the first will append data. --data-binary (HTTP) This posts data in a similar manner as --data-ascii does, although when using this option the entire context of the posted data is kept as-is. If you want to post a binary file without the strip-newlines feature of the --data-ascii option, this is for you.

If this option is used several times, the ones following the first will append data. --digest (HTTP) Enables HTTP Digest authentication. This is a authentication that prevents the password from being sent over the wire in clear text. Use this in combination with the normal -u/--user option to set user name and password. See also --ntlm , --negotiate and --anyauth for related options.

If this option is used several times, the following occurrences make no difference. --disable-eprt (FTP) Tell curl to disable the use of the EPRT and LPRT commands when doing active FTP transfers. Curl will normally always first attempt to use EPRT, then LPRT before using PORT, but with this option, it will use PORT right away. EPRT and LPRT are extensions to the original FTP protocol, may not work on all servers but enable more functionality in a better way than the traditional PORT command.

If this option is used several times, each occurrence will toggle this on/off. --disable-epsv (FTP) Tell curl to disable the use of the EPSV command when doing passive FTP transfers. Curl will normally always first attempt to use EPSV before PASV, but with this option, it will not try using EPSV.

If this option is used several times, each occurrence will toggle this on/off. -D/--dump-header Write the protocol headers to the specified file.

This option is handy to use when you want to store the headers that a HTTP site sends to you. Cookies from the headers could then be read in a second curl invoke by using the -b/--cookie option! The -c/--cookie-jar option is however a better way to store cookies.

When used on FTP, the ftp server response lines are considered being "headers" and thus are saved there.

If this option is used several times, the last one will be used. -e/--referer (HTTP) Sends the "Referer Page" information to the HTTP server. This can also be set with the -H/--header flag of course. When used with -L/--location you can append ";auto" to the --referer URL to make curl automatically set the previous URL when it follows a Location: header. The ";auto" string can be used alone, even if you don"t set an initial --referer.

If this option is used several times, the last one will be used. --engine Select the OpenSSL crypto engine to use for cipher operations. Use --engine list to print a list of build-time supported engines. Note that not all (or none) of the engines may be available at run-time. --environment (RISC OS ONLY) Sets a range of environment variables, using the names the -w option supports, to easier allow extraction of useful information after having run curl.

If this option is used several times, each occurrence will toggle this on/off. --egd-file (HTTPS) Specify the path name to the Entropy Gathering Daemon socket. The socket is used to seed the random engine for SSL connections. See also the --random-file option. -E/--cert (HTTPS) Tells curl to use the specified certificate file when getting a file with HTTPS. The certificate must be in PEM format. If the optional password isn"t specified, it will be queried for on the terminal. Note that this certificate is the private key and the private certificate concatenated!

If this option is used several times, the last one will be used. --cert-type (SSL) Tells curl what certificate type the provided certificate is in. PEM, DER and ENG are recognized types.

If this option is used several times, the last one will be used. --cacert (HTTPS) Tells curl to use the specified certificate file to verify the peer. The file may contain multiple CA certificates. The certificate(s) must be in PEM format.

curl recognizes the environment variable named "CURL_CA_BUNDLE" if that is set, and uses the given path as a path to a CA cert bundle. This option overrides that variable.

The windows version of curl will automatically look for a CA certs file named "curl-ca-bundle.crt", either in the same directory as curl.exe, or in the Current Working Directory, or in any folder along your PATH.

If this option is used several times, the last one will be used. --capath (HTTPS) Tells curl to use the specified certificate directory to verify the peer. The certificates must be in PEM format, and the directory must have been processed using the c_rehash utility supplied with openssl. Using --capath can allow curl to make https connections much more efficiently than using --cacert if the --cacert file contains many CA certificates.

If this option is used several times, the last one will be used. -f/--fail (HTTP) Fail silently (no output at all) on server errors. This is mostly done like this to better enable scripts etc to better deal with failed attempts. In normal cases when a HTTP server fails to deliver a document, it returns an HTML document stating so (which often also describes why and more). This flag will prevent curl from outputting that and return error 22.

If this option is used twice, the second will again disable silent failure. --ftp-account (FTP) When an FTP server asks for "account data" after user name and password has been provided, this data is sent off using the ACCT command. (Added in 7.13.0)

If this option is used twice, the second will override the previous use. --ftp-create-dirs (FTP) When an FTP URL/operation uses a path that doesn"t currently exist on the server, the standard behavior of curl is to fail. Using this option, curl will instead attempt to create missing directories.

If this option is used twice, the second will again disable directory creation. --ftp-method (FTP) Control what method curl should use to reach a file on a FTP(S) server. The method argument should be one of the following alternatives: multicwd curl does a single CWD operation for each path part in the given URL. For deep hierarchies this means very many commands. This is how RFC1738 says it should be done. This is the default but the slowest behavior. nocwd curl does no CWD at all. curl will do SIZE, RETR, STOR etc and give a full path to the server for all these commands. This is the fastest behavior. singlecwd curl does one CWD with the full target directory and then operates on the file "normally" (like in the multicwd case). This is somewhat more standards compliant than "nocwd" but without the full penalty of "multicwd". --ftp-pasv (FTP) Use PASV when transferring. PASV is the internal default behavior, but using this option can be used to override a previous --ftp-port option. (Added in 7.11.0)

If this option is used several times, the following occurrences make no difference.

Ftp-alternative-to-user (FTP) If authenticating with the USER and PASS commands fails, send this command. When connecting to Tumbleweed"s Secure Transport server over FTPS using a client certificate, using "SITE AUTH" will tell the server to retrieve the username from the certificate. (Added in 7.15.5) --ftp-skip-pasv-ip (FTP) Tell curl to not use the IP address the server suggests in its response to curl"s PASV command when curl connects the data connection. Instead curl will re-use the same IP address it already uses for the control connection. (Added in 7.14.2)

This option has no effect if PORT, EPRT or EPSV is used instead of PASV.

If this option is used twice, the second will again use the server"s suggested address. --ftp-ssl (FTP) Try to use SSL/TLS for the FTP connection. Reverts to a non-secure connection if the server doesn"t support SSL/TLS. (Added in 7.11.0)

If this option is used twice, the second will again disable this. --ftp-ssl-reqd (FTP) Require SSL/TLS for the FTP connection. Terminates the connection if the server doesn"t support SSL/TLS. (Added in 7.15.5)

If this option is used twice, the second will again disable this. -F/--form (HTTP) This lets curl emulate a filled in form in which a user has pressed the submit button. This causes curl to POST data using the Content-Type multipart/form-data according to RFC1867. This enables uploading of binary files etc. To force the "content" part to be a file, prefix the file name with an @ sign. To just get the content part from a file, prefix the file name with the letter <. The difference between @ and < is then that @ makes a file get attached in the post as a file upload, while the < makes a text field and just get the contents for that text field from a file.

Example, to send your password file to the server, where "password" is the name of the form-field to which /etc/passwd will be the input:

To read the file"s content from stdin instead of a file, use - where the file name should"ve been. This goes for both @ and < constructs.

You can also tell curl what Content-Type to use by using "type=", in a manner similar to:

curl -F "[email protected];type=text/html" url.com

curl -F "name=daniel;type=text/foo" url.com

You can also explicitly change the name field of an file upload part by setting filename=, like this:

curl -F "file=@localfile;filename=nameinpost" url.com

See further examples and details in the MANUAL.

This option can be used multiple times. --form-string (HTTP) Similar to --form except that the value string for the named parameter is used literally. Leading "@" and "<" characters, and the ";type=" string in the value have no special meaning. Use this in preference to --form if there"s any possibility that the string value may accidentally trigger the "@" or "<" features of --form . -g/--globoff This option switches off the "URL globbing parser". When you set this option, you can specify URLs that contain the letters {} without having them being interpreted by curl itself. Note that these letters are not normal legal URL contents but they should be encoded according to the URI standard. -G/--get When used, this option will make all data specified with -d/--data or --data-binary to be used in a HTTP GET request instead of the POST request that otherwise would be used. The data will be appended to the URL with a "?" separator.

If used in combination with -I, the POST data will instead be appended to the URL with a HEAD request.

If this option is used several times, the following occurrences make no difference. -h/--help Usage help. -H/--header

(HTTP) Extra header to use when getting a web page. You may specify any number of extra headers. Note that if you should add a custom header that has the same name as one of the internal ones curl would use, your externally set header will be used instead of the internal one. This allows you to make even trickier stuff than curl would normally do. You should not replace internally set headers without knowing perfectly well what you"re doing. Replacing an internal header with one without content on the right side of the colon will prevent that header from appearing.

curl will make sure that each header you add/replace get sent with the proper end of line marker, you should thus not add that as a part of the header content: do not add newlines or carriage returns they will only mess things up for you.

See also the -A/--user-agent and -e/--referer options.

This option can be used multiple times to add/replace/remove multiple headers. --ignore-content-length (HTTP) Ignore the Content-Length header. This is particularly useful for servers running Apache 1.x, which will report incorrect Content-Length for files larger than 2 gigabytes. -i/--include (HTTP) Include the HTTP-header in the output. The HTTP-header includes things like server-name, date of the document, HTTP-version and more...

If this option is used twice, the second will again disable header include. --interface Perform an operation using a specified interface. You can enter interface name, IP address or host name. An example could look like:

If this option is used several times, the last one will be used. -I/--head (HTTP/FTP/FILE) Fetch the HTTP-header only! HTTP-servers feature the command HEAD which this uses to get nothing but the header of a document. When used on a FTP or FILE file, curl displays the file size and last modification time only.

If this option is used twice, the second will again disable header only. -j/--junk-session-cookies (HTTP) When curl is told to read cookies from a given file, this option will make it discard all "session cookies". This will basically have the same effect as if a new session is started. Typical browsers always discard session cookies when they"re closed down.

If this option is used several times, each occurrence will toggle this on/off. -k/--insecure (SSL) This option explicitly allows curl to perform "insecure" SSL connections and transfers. All SSL connections are attempted to be made secure by using the CA certificate bundle installed by default. This makes all connections considered "insecure" to fail unless -k/--insecure is used.

If this option is used twice, the second time will again disable it. --key (SSL) Private key file name. Allows you to provide your private key in this separate file.

If this option is used several times, the last one will be used. --key-type (SSL) Private key file type. Specify which type your --key provided private key is. DER, PEM and ENG are supported.

If this option is used several times, the last one will be used. --krb4 (FTP) Enable kerberos4 authentication and use. The level must be entered and should be one of "clear", "safe", "confidential" or "private". Should you use a level that is not one of these, "private" will instead be used.

This option requires that the library was built with kerberos4 support. This is not very common. Use -V/--version to see if your curl supports it.

If this option is used several times, the last one will be used. -K/--config Specify which config file to read curl arguments from. The config file is a text file in which command line arguments can be written which then will be used as if they were written on the actual command line. Options and their parameters must be specified on the same config file line. If the parameter is to contain white spaces, the parameter must be enclosed within quotes. If the first column of a config line is a "#" character, the rest of the line will be treated as a comment.

Specify the filename as "-" to make curl read the file from stdin.

Note that to be able to specify a URL in the config file, you need to specify it using the --url option, and not by simply writing the URL on its own line. So, it could look similar to this:

This option can be used multiple times.

When curl is invoked, it always (unless -q is used) checks for a default config file and uses it if found. The default config file is checked for in the following places in this order:

1) curl tries to find the "home dir": It first checks for the CURL_HOME and then the HOME environment variables. Failing that, it uses getpwuid() on unix-like systems (which returns the home dir given the current user in your system). On Windows, it then checks for the APPDATA variable, or as a last resort the "%USERPROFILE%Application Data".

2) On windows, if there is no _curlrc file in the home dir, it checks for one in the same dir the executable curl is placed. On unix-like systems, it will simply try to load .curlrc from the determined home dir. --limit-rate Specify the maximum transfer rate you want curl to use. This feature is useful if you have a limited pipe and you"d like your transfer not use your entire bandwidth.

The given speed is measured in bytes/second, unless a suffix is appended. Appending "k" or "K" will count the number as kilobytes, "m" or M" makes it megabytes while "g" or "G" makes it gigabytes. Examples: 200K, 3m and 1G.

If you are also using the -Y/--speed-limit option, that option will take precedence and might cripple the rate-limiting slightly, to help keeping the speed-limit logic working.

If this option is used several times, the last one will be used. -l/--list-only (FTP) When listing an FTP directory, this switch forces a name-only view. Especially useful if you want to machine-parse the contents of an FTP directory since the normal directory view doesn"t use a standard look or format.

This option causes an FTP NLST command to be sent. Some FTP servers list only files in their response to NLST; they do not include subdirectories and symbolic links.

If this option is used twice, the second will again disable list only. --local-port [-num] Set a prefered number or range of local port numbers to use for the connection(s). Note that port numbers by nature is a scarce resource that will be busy at times so setting this range to something too narrow might cause unnecessary connection setup failures. (Added in 7.15.2) -L/--location (HTTP/HTTPS) If the server reports that the requested page has moved to a different location (indicated with a Location: header and a 3XX response code) this option will make curl redo the request on the new place. If used together with -i/--include or -I/--head , headers from all requested pages will be shown. When authentication is used, curl only sends its credentials to the initial host. If a redirect takes curl to a different host, it won"t be able to intercept the user+password. See also --location-trusted on how to change this. You can limit the amount of redirects to follow by using the --max-redirs option.

If this option is used twice, the second will again disable location following. --location-trusted (HTTP/HTTPS) Like -L/--location , but will allow sending the name + password to all hosts that the site may redirect to. This may or may not introduce a security breach if the site redirects you do a site to which you"ll send your authentication info (which is plaintext in the case of HTTP Basic authentication).

If this option is used twice, the second will again disable location following. --max-filesize Specify the maximum size (in bytes) of a file to download. If the file requested is larger than this value, the transfer will not start and curl will return with exit code 63.

NOTE: The file size is not always known prior to download, and for such files this option has no effect even if the file transfer ends up being larger than this given limit. This concerns both FTP and HTTP transfers. -m/--max-time Maximum time in seconds that you allow the whole operation to take. This is useful for preventing your batch jobs from hanging for hours due to slow networks or links going down. See also the --connect-timeout option.

If this option is used several times, the last one will be used. -M/--manual Manual. Display the huge help text. -n/--netrc Makes curl scan the .netrc file in the user"s home directory for login name and password. This is typically used for ftp on unix. If used with http, curl will enable user authentication. See (4) or (1) for details on the file format. Curl will not complain if that file hasn"t the right permissions (it should not be world nor group readable). The environment variable "HOME" is used to find the home directory.

A quick and very simple example of how to setup a .netrc to allow curl to ftp to the machine host.domain.com with user name "myself" and password "secret" should look similar to:

machine host.domain.com login myself password secret

If this option is used twice, the second will again disable netrc usage. --netrc-optional Very similar to --netrc , but this option makes the .netrc usage optional and not mandatory as the --netrc does. --negotiate (HTTP) Enables GSS-Negotiate authentication. The GSS-Negotiate method was designed by Microsoft and is used in their web applications. It is primarily meant as a support for Kerberos5 authentication but may be also used along with another authentication methods. For more information see IETF draft draft-brezak-spnego-http-04.txt.

This option requires that the library was built with GSSAPI support. This is not very common. Use -V/--version to see if your version supports GSS-Negotiate.

When using this option, you must also provide a fake -u/--user option to activate the authentication code properly. Sending a "-u:" is enough as the user name and password from the -u option aren"t actually used.

If this option is used several times, the following occurrences make no difference. -N/--no-buffer Disables the buffering of the output stream. In normal work situations, curl will use a standard buffered output stream that will have the effect that it will output the data in chunks, not necessarily exactly when the data arrives. Using this option will disable that buffering.

If this option is used twice, the second will again switch on buffering. --ntlm (HTTP) Enables NTLM authentication. The NTLM authentication method was designed by Microsoft and is used by IIS web servers. It is a proprietary protocol, reversed engineered by clever people and implemented in curl based on their efforts. This kind of behavior should not be endorsed, you should encourage everyone who uses NTLM to switch to a public and documented authentication method instead. Such as Digest.

If you want to enable NTLM for your proxy authentication, then use --proxy-ntlm .

This option requires that the library was built with SSL support. Use -V/--version to see if your curl supports NTLM.

If this option is used several times, the following occurrences make no difference. -o/--output Write output to instead of stdout. If you are using {} or to fetch multiple documents, you can use "#" followed by a number in the specifier. That variable will be replaced with the current string for the URL being fetched. Like in:

You may use this option as many times as you have number of URLs.

See also the --create-dirs option to create the local directories dynamically. -O/--remote-name Write output to a local file named like the remote file we get. (Only the file part of the remote file is used, the path is cut off.)

The remote file name to use for saving is extracted from the given URL, nothing else.

You may use this option as many times as you have number of URLs. --pass (SSL) Pass phrase for the private key

If this option is used several times, the last one will be used. --proxy-anyauth Tells curl to pick a suitable authentication method when communicating with the given proxy. This will cause an extra request/response round-trip. (Added in 7.13.2)

If this option is used twice, the second will again disable the proxy use-any authentication. --proxy-basic Tells curl to use HTTP Basic authentication when communicating with the given proxy. Use --basic for enabling HTTP Basic with a remote host. Basic is the default authentication method curl uses with proxies.

If this option is used twice, the second will again disable proxy HTTP Basic authentication. --proxy-digest Tells curl to use HTTP Digest authentication when communicating with the given proxy. Use --digest for enabling HTTP Digest with a remote host.

If this option is used twice, the second will again disable proxy HTTP Digest. --proxy-ntlm Tells curl to use HTTP NTLM authentication when communicating with the given proxy. Use --ntlm for enabling NTLM with a remote host.

If this option is used twice, the second will again disable proxy HTTP NTLM. -p/--proxytunnel When an HTTP proxy is used (-x/--proxy ), this option will cause non-HTTP protocols to attempt to tunnel through the proxy instead of merely using it to do HTTP-like operations. The tunnel approach is made with the HTTP proxy CONNECT request and requires that the proxy allows direct connect to the remote port number curl wants to tunnel through to.

If this option is used twice, the second will again disable proxy tunnel. -P/--ftp-port

(FTP) Reverses the initiator/listener roles when connecting with ftp. This switch makes Curl use the PORT command instead of PASV. In practice, PORT tells the server to connect to the client"s specified address and port, while PASV asks the server for an ip address and port to connect to.

should be one of: interface i.e "eth0" to specify which interface"s IP address you want to use (Unix only) IP address i.e "192.168.10.1" to specify exact IP number host name i.e "my.host.domain" to specify machine - make curl pick the same IP address that is already used for the control connection

If this option is used several times, the last one will be used. Disable the use of PORT with --ftp-pasv . Disable the attempt to use the EPRT command instead of PORT by using --disable-eprt . EPRT is really PORT++. -q If used as the first parameter on the command line, the curlrc config file will not be read and used. See the -K/--config for details on the default config file search path. -Q/--quote (FTP) Send an arbitrary command to the remote FTP server. Quote commands are sent BEFORE the transfer is taking place (just after the initial PWD command to be exact). To make commands take place after a successful transfer, prefix them with a dash "-". To make commands get sent after libcurl has changed working directory, just before the transfer command(s), prefix the command with "+". You may specify any amount of commands. If the server returns failure for one of the commands, the entire operation will be aborted. You must send syntactically correct FTP commands as RFC959 defines.

This option can be used multiple times. --random-file (HTTPS) Specify the path name to file containing what will be considered as random data. The data is used to seed the random engine for SSL connections. See also the --egd-file option. -r/--range (HTTP/FTP) Retrieve a byte range (i.e a partial document) from a HTTP/1.1 or FTP server. Ranges can be specified in a number of ways. 0-499 specifies the first 500 bytes 500-999 specifies the second 500 bytes -500 specifies the last 500 bytes 9500- specifies the bytes from offset 9500 and forward 0-0,-1 specifies the first and last byte only(*)(H) 500-700,600-799 specifies 300 bytes from offset 500(H) 100-199,500-599 specifies two separate 100 bytes ranges(*)(H)

(*) = NOTE that this will cause the server to reply with a multipart response!

You should also be aware that many HTTP/1.1 servers do not have this feature enabled, so that when you attempt to get a range, you"ll instead get the whole document.

FTP range downloads only support the simple syntax "start-stop" (optionally with one of the numbers omitted). It depends on the non-RFC command SIZE.

If this option is used several times, the last one will be used. -R/--remote-time When used, this will make libcurl attempt to figure out the timestamp of the remote file, and if that is available make the local file get that same timestamp.

If this option is used twice, the second time disables this again. --retry If a transient error is returned when curl tries to perform a transfer, it will retry this number of times before giving up. Setting the number to 0 makes curl do no retries (which is the default). Transient error means either: a timeout, an FTP 5xx response code or an HTTP 5xx response code.

When curl is about to retry a transfer, it will first wait one second and then for all forthcoming retries it will double the waiting time until it reaches 10 minutes which then will be the delay between the rest of the retries. By using --retry-delay you disable this exponential backoff algorithm. See also --retry-max-time to limit the total time allowed for retries. (Added in 7.12.3)

If this option is used multiple times, the last occurrence decide the amount. --retry-delay Make curl sleep this amount of time between each retry when a transfer has failed with a transient error (it changes the default backoff time algorithm between retries). This option is only interesting if --retry is also used. Setting this delay to zero will make curl use the default backoff time. (Added in 7.12.3)

If this option is used multiple times, the last occurrence decide the amount. --retry-max-time The retry timer is reset before the first transfer attempt. Retries will be done as usual (see --retry ) as long as the timer hasn"t reached this given limit. Notice that if the timer hasn"t reached the limit, the request will be made and while performing, it may take longer than this given time period. To limit a single request"s maximum time, use -m/--max-time . Set this option to zero to not timeout retries. (Added in 7.12.3)

If this option is used multiple times, the last occurrence decide the amount. -s/--silent Silent mode. Don"t show progress meter or error messages. Makes Curl mute.

If this option is used twice, the second will again disable silent mode. -S/--show-error When used with -s it makes curl show error message if it fails.

If this option is used twice, the second will again disable show error. --socks4 Use the specified SOCKS4 proxy. If the port number is not specified, it is assumed at port 1080. (Added in 7.15.2)

-x/--proxy

If this option is used several times, the last one will be used. --socks5 Use the specified SOCKS5 proxy. If the port number is not specified, it is assumed at port 1080. (Added in 7.11.1)

This option overrides any previous use of -x/--proxy , as they are mutually exclusive.

If this option is used several times, the last one will be used. (This option was previously wrongly documented and used as --socks without the number appended.) --stderr Redirect all writes to stderr to the specified file instead. If the file name is a plain "-", it is instead written to stdout. This option has no point when you"re using a shell with decent redirecting capabilities.

If this option is used several times, the last one will be used. --tcp-nodelay Turn on the TCP_NODELAY option. See the (3) man page for details about this option. (Added in 7.11.2)

If this option is used several times, each occurrence toggles this on/off. -t/--telnet-option Pass options to the telnet protocol. Supported options are:

TTYPE= Sets the terminal type.

XDISPLOC= Sets the X display location.

NEW_ENV= Sets an environment variable. -T/--upload-file This transfers the specified local file to the remote URL. If there is no file part in the specified URL, Curl will append the local file name. NOTE that you must use a trailing / on the last directory to really prove to Curl that there is no file name or curl will think that your last directory name is the remote file name to use. That will most likely cause the upload operation to fail. If this is used on a http(s) server, the PUT command will be used.

Use the file name "-" (a single dash) to use stdin instead of a given file.

You can specify one -T for each URL on the command line. Each -T + URL pair specifies what to upload and to where. curl also supports "globbing" of the -T argument, meaning that you can upload multiple files to a single URL by using the same URL globbing style supported in the URL, like this:

FILES

~/.curlrc Default config file, see -K/--config for details.

ENVIRONMENT

http_proxy [:port] Sets proxy server to use for HTTP. HTTPS_PROXY [:port] Sets proxy server to use for HTTPS. FTP_PROXY [:port] Sets proxy server to use for FTP. ALL_PROXY [:port] Sets proxy server to use if no protocol-specific proxy is set. NO_PROXY list of host names that shouldn"t go through any proxy. If set to a asterisk "*" only, it matches all hosts.

EXIT CODES

There exists a bunch of different error codes and their corresponding error messages that may appear during bad conditions. At the time of this writing, the exit codes are: 1 Unsupported protocol. This build of curl has no support for this protocol. 2 Failed to initialize. 3 URL malformat. The syntax was not correct. 4 URL user malformatted. The user-part of the URL syntax was not correct. 5 Couldn"t resolve proxy. The given proxy host could not be resolved. 6 Couldn"t resolve host. The given remote host was not resolved. 7 Failed to connect to host. 8 FTP weird server reply. The server sent data curl couldn"t parse. 9 FTP access denied. The server denied login or denied access to the particular resource or directory you wanted to reach. Most often you tried to change to a directory that doesn"t exist on the server. 10 FTP user/password incorrect. Either one or both were not accepted by the server. 11 FTP weird PASS reply. Curl couldn"t parse the reply sent to the PASS request. 12 FTP weird USER reply. Curl couldn"t parse the reply sent to the USER request. 13 FTP weird PASV reply, Curl couldn"t parse the reply sent to the PASV request. 14 FTP weird 227 format. Curl couldn"t parse the 227-line the server sent. 15 FTP can"t get host. Couldn"t resolve the host IP we got in the 227-line. 16 FTP can"t reconnect. Couldn"t connect to the host we got in the 227-line. 17 FTP couldn"t set binary. Couldn"t change transfer method to binary. 18 Partial file. Only a part of the file was transferred. 19 FTP couldn"t download/access the given file, the RETR (or similar) command failed. 20 FTP write error. The transfer was reported bad by the server. 21 FTP quote error. A quote command returned error from the server. 22 HTTP page not retrieved. The requested url was not found or returned another error with the HTTP error code being 400 or above. This return code only appears if -f/--fail is used. 23 Write error. Curl couldn"t write data to a local filesystem or similar. 24 Malformed user. User name badly specified. 25 FTP couldn"t STOR file. The server denied the STOR operation, used for FTP uploading. 26 Read error. Various reading problems. 27 Out of memory. A memory allocation request failed. 28 Operation timeout. The specified time-out period was reached according to the conditions. 29 FTP couldn"t set ASCII. The server returned an unknown reply. 30 FTP PORT failed. The PORT command failed. Not all FTP servers support the PORT command, try doing a transfer using PASV instead! 31 FTP couldn"t use REST. The REST command failed. This command is used for resumed FTP transfers. 32 FTP couldn"t use SIZE. The SIZE command failed. The command is an extension to the original FTP spec RFC 959. 33 HTTP range error. The range "command" didn"t work. 34 HTTP post error. Internal post-request generation error. 35 SSL connect error. The SSL handshaking failed. 36 FTP bad download resume. Couldn"t continue an earlier aborted download. 37 FILE couldn"t read file. Failed to open the file. Permissions? 38 LDAP cannot bind. LDAP bind operation failed. 39 LDAP search failed. 40 Library not found. The LDAP library was not found. 41 Function not found. A required LDAP function was not found. 42 Aborted by callback. An application told curl to abort the operation. 43 Internal error. A function was called with a bad parameter. 44 Internal error. A function was called in a bad order. 45 Interface error. A specified outgoing interface could not be used. 46 Bad password entered. An error was signaled when the password was entered. 47 Too many redirects. When following redirects, curl hit the maximum amount. 48 Unknown TELNET option specified. 49 Malformed telnet option. 51 The remote peer"s SSL certificate wasn"t ok 52 The server didn"t reply anything, which here is considered an error. 53 SSL crypto engine not found 54 Cannot set SSL crypto engine as default 55 Failed sending network data 56 Failure in receiving network data 57 Share is in use (internal error) 58 Problem with the local certificate 59 Couldn"t use specified SSL cipher 60 Problem with the CA cert (path? permission?) 61 Unrecognized transfer encoding 62 Invalid LDAP URL 63 Maximum file size exceeded 64 Requested FTP SSL level failed 65 Sending the data requires a rewind that failed 66 Failed to initialise SSL Engine 67 User, password or similar was not accepted and curl failed to login 68 File not found on TFTP server 69 Permission problem on TFTP server 70 Out of disk space on TFTP server 71 Illegal TFTP operation 72 Unknown TFTP transfer ID 73 File already exists (TFTP) 74 No such user (TFTP) 75 Character conversion failed 76 Character conversion functions required XX There will appear more error codes here in future releases. The existing ones are meant to never change.

cURL - это специальный инструмент, который предназначен для того, чтобы передавать файлы и данные синтаксисом URL. Данная технология поддерживает множество протоколов, таких как HTTP, FTP, TELNET и многие другие. Изначально cURL было разработано для того, чтобы быть инструментом командной строки. К счастью для нас, библиотека cURL поддерживается языком программирования PHP. В этой статье мы рассмотрим некоторые расширенные функций cURL, а также затронем практическое применение полученных знаний средствами PHP.

Почему cURL?

На самом деле, существует немало альтернативных способов выборки содержания веб-страницы. Во многих случаях, главным образом из-за лени, я использовал простые PHP функции вместо cURL:

$content = file_get_contents("http://www.nettuts.com"); // или $lines = file("http://www.nettuts.com"); // или readfile("http://www.nettuts.com");

Однако данные функции не имеют фактически никакой гибкости и содержат огромное количество недостатков в том, что касается обработки ошибок и т.д. Кроме того, существуют определенные задачи, которые вы просто не можете решить благодаря этим стандартным функциям: взаимодействие с cookie, аутентификация, отправка формы, загрузка файлов и т.д.

cURL - это мощная библиотека, которая поддерживает множество различных протоколов, опций и обеспечивает подробную информацию о URL запросах.

Базовая структура

Инициализация
Назначение параметров
Выполнение и выборка результата
Освобождение памяти

// 1. инициализация $ch = curl_init(); // 2. указываем параметры, включая url curl_setopt($ch, CURLOPT_URL, "http://www.nettuts.com"); curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); curl_setopt($ch, CURLOPT_HEADER, 0); // 3. получаем HTML в качестве результата $output = curl_exec($ch); // 4. закрываем соединение curl_close($ch);

Шаг #2 (то есть, вызов curl_setopt()) будем обсуждать в этой статье намного больше, чем все другие этапы, т.к. на этой стадии происходит всё самое интересное и полезное, что вам необходимо знать. В cURL существует огромное количество различных опций, которые должны быть указаны, для того чтобы иметь возможность сконфигурировать URL-запрос самым тщательным образом. Мы не будем рассматривать весь список целиком, а остановимся только на том, что я посчитаю нужным и полезным для этого урока. Всё остальное вы сможете изучить сами, если эта тема вас заинтересует.

Проверка Ошибки

Вдобавок, вы также можете использовать условные операторы для проверки выполнения операции на успех:

// ... $output = curl_exec($ch); if ($output === FALSE) { echo "cURL Error: " . curl_error($ch); } // ...

Тут прошу отметить для себя очень важный момент: мы должны использовать “=== false” для сравнения, вместо “== false”. Для тех, кто не в курсе, это поможет нам отличать пустой результат от булевого значения false, которое и будет указывать на ошибку.

Получение информации

Ещё одним дополнительным шагом является получение данных о cURL запросе, после того, как он был выполнен.

// ... curl_exec($ch); $info = curl_getinfo($ch); echo "Took " . $info["total_time"] . " seconds for url " . $info["url"]; // …

Возвращаемый массив содержит следующую информацию:

“url”
“content_type”
“http_code”
“header_size”
“request_size”
“filetime”
“ssl_verify_result”
“redirect_count”
“total_time”
“namelookup_time”
“connect_time”
“pretransfer_time”
“size_upload”
“size_download”
“speed_download”
“speed_upload”
“download_content_length”
“upload_content_length”
“starttransfer_time”
“redirect_time”

Обнаружение перенаправления в зависимости от браузера

В этом первом примере мы напишем код, который сможет обнаружить перенаправления URL, основанные на различных настройках браузера. Например, некоторые веб-сайты перенаправляют браузеры сотового телефона, или любого другого устройства.

Мы собираемся использовать опцию CURLOPT_HTTPHEADER для того, чтобы определить наши исходящие HTTP заголовки, включая название браузера пользователя и доступные языки. В конечном итоге мы сможем определить, какие сайты перенаправляют нас к разным URL.

// тестируем URL $urls = array("http://www.cnn.com", "http://www.mozilla.com", "http://www.facebook.com"); // тестируем браузеры $browsers = array("standard" => array ("user_agent" => "Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.1.6) Gecko/20091201 Firefox/3.5.6 (.NET CLR 3.5.30729)", "language" => "en-us,en;q=0.5"), "iphone" => array ("user_agent" => "Mozilla/5.0 (iPhone; U; CPU like Mac OS X; en) AppleWebKit/420+ (KHTML, like Gecko) Version/3.0 Mobile/1A537a Safari/419.3", "language" => "en"), "french" => array ("user_agent" => "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; GTB6; .NET CLR 2.0.50727)", "language" => "fr,fr-FR;q=0.5")); foreach ($urls as $url) { echo "URL: $url\n"; foreach ($browsers as $test_name => $browser) { $ch = curl_init(); // указываем url curl_setopt($ch, CURLOPT_URL, $url); // указываем заголовки для браузера curl_setopt($ch, CURLOPT_HTTPHEADER, array("User-Agent: {$browser["user_agent"]}", "Accept-Language: {$browser["language"]}")); // нам не нужно содержание страницы curl_setopt($ch, CURLOPT_NOBODY, 1); // нам необходимо получить HTTP заголовки curl_setopt($ch, CURLOPT_HEADER, 1); // возвращаем результаты вместо вывода curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); $output = curl_exec($ch); curl_close($ch); // был ли HTTP редирект? if (preg_match("!Location: (.*)!", $output, $matches)) { echo "$test_name: redirects to $matches\n"; } else { echo "$test_name: no redirection\n"; } } echo "\n\n"; }

Сначала мы указываем список URL сайтов, которые будем проверять. Точнее, нам понадобятся адреса данных сайтов. Далее нам необходимо определить настройки браузера, чтобы протестировать каждый из этих URL. После этого мы воспользуемся циклом, в котором пробежимся по всем полученным результатам.

Приём, который мы используем в этом примере для того, чтобы задать настройки cURL, позволит нам получить не содержание страницы, а только HTTP-заголовки (сохраненные в $output). Далее, воспользовавшись простым regex, мы можем определить, присутствовала ли строка “Location:” в полученных заголовках.

Когда вы запустите данный код, то должны будете получить примерно следующий результат:

Создание POST запроса на определённый URL

При формировании GET запроса передаваемые данные могут быть переданы на URL через “строку запроса”. Например, когда Вы делаете поиск в Google, критерий поиска располагаются в адресной строке нового URL:

Http://www.google.com/search?q=ruseller

Для того чтобы сымитировать данный запрос, вам не нужно пользоваться средствами cURL. Если лень вас одолевает окончательно, воспользуйтесь функцией “file_get_contents()”, для того чтобы получить результат.

Но дело в том, что некоторые HTML-формы отправляют POST запросы. Данные этих форм транспортируются через тело HTTP запроса, а не как в предыдущем случае. Например, если вы заполнили форму на форуме и нажали на кнопку поиска, то скорее всего будет совершён POST запрос:

Http://codeigniter.com/forums/do_search/

Мы можем написать PHP скрипт, который может сымитировать этот вид URL запроса. Сначала давайте создадим простой файл для принятия и отображения POST данных. Назовём его post_output.php:

Print_r($_POST);

Затем мы создаем PHP скрипт, чтобы выполнить cURL запрос:

$url = "http://localhost/post_output.php"; $post_data = array ("foo" => "bar", "query" => "Nettuts", "action" => "Submit"); $ch = curl_init(); curl_setopt($ch, CURLOPT_URL, $url); curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); // указываем, что у нас POST запрос curl_setopt($ch, CURLOPT_POST, 1); // добавляем переменные curl_setopt($ch, CURLOPT_POSTFIELDS, $post_data); $output = curl_exec($ch); curl_close($ch); echo $output;

При запуске данного скрипта вы должны получить подобный результат:

Таким образом, POST запрос был отправлен скрипту post_output.php, который в свою очередь, вывел суперглобальный массив $_POST, содержание которого мы получили при помощи cURL.

Загрузка файла

Сначала давайте создадим файл для того, чтобы сформировать его и отправить файлу upload_output.php:

Print_r($_FILES);

А вот и код скрипта, который выполняет указанный выше функционал:

$url = "http://localhost/upload_output.php"; $post_data = array ("foo" => "bar", // файл, который необходимо загрузить "upload" => "@C:/wamp/www/test.zip"); $ch = curl_init(); curl_setopt($ch, CURLOPT_URL, $url); curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); curl_setopt($ch, CURLOPT_POST, 1); curl_setopt($ch, CURLOPT_POSTFIELDS, $post_data); $output = curl_exec($ch); curl_close($ch); echo $output;

Когда вы хотите загрузить файл, все, что вам нужно сделать, так это передать его как обычную post переменную, предварительно поместив перед ней символ @. При запуске написанного скрипта вы получите следующий результат:

Множественный cURL

Одной из самых сильных сторон cURL является возможность создания "множественных" cURL обработчиков. Это позволяет вам открывать соединение к множеству URL одновременно и асинхронно.

В классическом варианте cURL запроса выполнение скрипта приостанавливается, и происходит ожидание завершения операции URL запроса, после чего работа скрипта может продолжиться. Если вы намереваетесь взаимодействовать с целым множеством URL, это приведёт к довольно-таки значительным затратам времени, поскольку в классическом варианте вы можете работать только с одним URL за один раз. Однако, мы можем исправить данную ситуацию, воспользовавшись специальными обработчиками.

Давайте рассмотрим пример кода, который я взял с php.net:

// создаём несколько cURL ресурсов $ch1 = curl_init(); $ch2 = curl_init(); // указываем URL и другие параметры curl_setopt($ch1, CURLOPT_URL, "http://lxr.php.net/"); curl_setopt($ch1, CURLOPT_HEADER, 0); curl_setopt($ch2, CURLOPT_URL, "http://www.php.net/"); curl_setopt($ch2, CURLOPT_HEADER, 0); //создаём множественный cURL обработчик $mh = curl_multi_init(); //добавляем несколько обработчиков curl_multi_add_handle($mh,$ch1); curl_multi_add_handle($mh,$ch2); $active = null; //выполнение do { $mrc = curl_multi_exec($mh, $active); } while ($mrc == CURLM_CALL_MULTI_PERFORM); while ($active && $mrc == CURLM_OK) { if (curl_multi_select($mh) != -1) { do { $mrc = curl_multi_exec($mh, $active); } while ($mrc == CURLM_CALL_MULTI_PERFORM); } } //закрытие curl_multi_remove_handle($mh, $ch1); curl_multi_remove_handle($mh, $ch2); curl_multi_close($mh);

Идея состоит в том, что вы можете использовать множественные cURL обработчики. Используя простой цикл, вы можете отследить, какие запросы ещё не выполнились.

В этом примере есть два основных цикла. Первый цикл do-while вызывает функцию curl_multi_exec(). Эта функция не блокируемая. Она выполняется с той скоростью, с которой может, и возвращает состояние запроса. Пока возвращенное значение является константой ‘CURLM_CALL_MULTI_PERFORM’, это означает, что работа ещё не завершена (например, в данный момент происходит отправка http заголовков в URL); Именно поэтому мы продолжаем проверять это возвращаемое значение, пока не получим другой результат.

В следующем цикле мы проверяем условие, пока переменная $active = "true". Она является вторым параметром для функции curl_multi_exec(). Значение данной переменной будет равно "true", до тех пор, пока какое-то из существующих изменений является активным. Далее мы вызываем функцию curl_multi_select(). Её выполнение "блокируется", пока существует хоть одно активное соединение, до тех пор, пока не будет получен ответ. Когда это произойдёт, мы возвращаемся в основной цикл, чтобы продолжить выполнение запросов.

А теперь давайте применим полученные знания на примере, который будет реально полезным для большого количества людей.

Проверяем ссылки в WordPress

Представьте себе блог с огромным количеством постов и сообщений, в каждом из которых есть ссылки на внешние интернет ресурсы. Некоторые из этих ссылок по различным причинам могли бы уже быть «мертвыми». Возможно, страница была удалена или сайт вовсе не работает.

Мы собираемся создать скрипт, который проанализирует все ссылки и найдёт незагружающиеся веб-сайты и страницы 404, после чего предоставит нам подробнейший отчёт.

Сразу же скажу, что это не пример создания плагина для WordPress. Это всего на всего хороший полигон для наших испытаний.

Давайте же наконец начнём. Сначала мы должны сделать выборку всех ссылок из базы данных:

// конфигурация $db_host = "localhost"; $db_user = "root"; $db_pass = ""; $db_name = "wordpress"; $excluded_domains = array("localhost", "www.mydomain.com"); $max_connections = 10; // инициализация переменных $url_list = array(); $working_urls = array(); $dead_urls = array(); $not_found_urls = array(); $active = null; // подключаемся к MySQL if (!mysql_connect($db_host, $db_user, $db_pass)) { die("Could not connect: " . mysql_error()); } if (!mysql_select_db($db_name)) { die("Could not select db: " . mysql_error()); } // выбираем все опубликованные посты, где есть ссылки $q = "SELECT post_content FROM wp_posts WHERE post_content LIKE "%href=%" AND post_status = "publish" AND post_type = "post""; $r = mysql_query($q) or die(mysql_error()); while ($d = mysql_fetch_assoc($r)) { // делаем выборку ссылок при помощи регулярных выражений if (preg_match_all("!href=\"(.*?)\"!", $d["post_content"], $matches)) { foreach ($matches as $url) { $tmp = parse_url($url); if (in_array($tmp["host"], $excluded_domains)) { continue; } $url_list = $url; } } } // убираем дубликаты $url_list = array_values(array_unique($url_list)); if (!$url_list) { die("No URL to check"); }

Сначала мы формируем конфигурационные данные для взаимодействия с базой данных, далее пишем список доменов, которые не будут участвовать в проверке ($excluded_domains). Также мы определяем число, характеризующее количество максимальных одновременных соединений, которые мы будем использовать в нашем скрипте ($max_connections). Затем мы присоединяемся к базе данных, выбираем посты, которые содержат ссылки, и накапливаем их в массив ($url_list).

Следующий код немного сложен, так что разберитесь в нём от начала до конца:

// 1. множественный обработчик $mh = curl_multi_init(); // 2. добавляем множество URL for ($i = 0; $i < $max_connections; $i++) { add_url_to_multi_handle($mh, $url_list); } // 3. инициализация выполнения do { $mrc = curl_multi_exec($mh, $active); } while ($mrc == CURLM_CALL_MULTI_PERFORM); // 4. основной цикл while ($active && $mrc == CURLM_OK) { // 5. если всё прошло успешно if (curl_multi_select($mh) != -1) { // 6. делаем дело do { $mrc = curl_multi_exec($mh, $active); } while ($mrc == CURLM_CALL_MULTI_PERFORM); // 7. если есть инфа? if ($mhinfo = curl_multi_info_read($mh)) { // это значит, что запрос завершился // 8. извлекаем инфу $chinfo = curl_getinfo($mhinfo["handle"]); // 9. мёртвая ссылка? if (!$chinfo["http_code"]) { $dead_urls = $chinfo["url"]; // 10. 404? } else if ($chinfo["http_code"] == 404) { $not_found_urls = $chinfo["url"]; // 11. рабочая } else { $working_urls = $chinfo["url"]; } // 12. чистим за собой curl_multi_remove_handle($mh, $mhinfo["handle"]); // в случае зацикливания, закомментируйте данный вызов curl_close($mhinfo["handle"]); // 13. добавляем новый url и продолжаем работу if (add_url_to_multi_handle($mh, $url_list)) { do { $mrc = curl_multi_exec($mh, $active); } while ($mrc == CURLM_CALL_MULTI_PERFORM); } } } } // 14. завершение curl_multi_close($mh); echo "==Dead URLs==\n"; echo implode("\n",$dead_urls) . "\n\n"; echo "==404 URLs==\n"; echo implode("\n",$not_found_urls) . "\n\n"; echo "==Working URLs==\n"; echo implode("\n",$working_urls); function add_url_to_multi_handle($mh, $url_list) { static $index = 0; // если у нас есть ещё url, которые нужно достать if ($url_list[$index]) { // новый curl обработчик $ch = curl_init(); // указываем url curl_setopt($ch, CURLOPT_URL, $url_list[$index]); curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1); curl_setopt($ch, CURLOPT_NOBODY, 1); curl_multi_add_handle($mh, $ch); // переходим на следующий url $index++; return true; } else { // добавление новых URL завершено return false; } }

Тут я попытаюсь изложить всё по полочкам. Числа в списке соответствуют числам в комментарии.

1. Создаём множественный обработчик;
2. Функцию add_url_to_multi_handle() мы напишем чуть позже. Каждый раз, когда она будет вызываться, начнётся обработка нового url. Первоначально, мы добавляем 10 ($max_connections) URL;
3. Для того чтобы начать работу, мы должны запустить функцию curl_multi_exec(). До тех пор, пока она будет возвращать CURLM_CALL_MULTI_PERFORM, нам ещё есть, что делать. Это нам нужно, главным образом, для того, чтобы создать соединения;
4. Далее следует основной цикл, который будет выполняться до тех пор, пока у нас есть хоть одно активное соединение;
5. curl_multi_select() зависает в ожидании, пока поиск URL не завершится;
6. И снова мы должны заставить cURL выполнить некоторую работу, а именно, сделать выборку данных возвращаемого ответа;
7. Тут происходит проверка информации. В результате выполнения запроса будет возвращён массив;
8. В возвращенном массиве присутствует cURL обработчик. Его мы и будем использовать для того, чтобы выбрать информацию об отдельном cURL запросе;
9. Если ссылка была мертва, или время выполнения скрипта вышло, то нам не следует искать никакого http кода;
10. Если ссылка возвратила нам страницу 404, то http код будет содержать значение 404;
11. В противном случае, перед нами находится рабочая ссылка. (Вы можете добавить дополнительные проверки на код ошибки 500 и т.д...);
12. Далее мы удаляем cURL обработчик, потому что больше в нём не нуждаемся;
13. Теперь мы можем добавить другой url и запустить всё то, о чём говорили до этого;
14. На этом шаге скрипт завершает свою работу. Мы можем удалить всё, что нам не нужно и сформировать отчет;
15. В конце концов, напишем функцию, которая будет добавлять url в обработчик. Статическая переменная $index будет увеличиваться каждый раз, когда данная функция будет вызвана.

Я использовал данный скрипт на своем блоге (с некоторыми неработающими ссылками, которые добавил нарочно для того, чтобы протестировать его работу) и получил следующий результат:

В моём случае, скрипту потребовалось чуть меньше чем 2 секунды, чтобы пробежаться по 40 URL. Увеличение производительности является существенным при работе с еще большим количеством URL адресов. Если вы открываете десять соединений одновременно, то скрипт может выполниться в десять раз быстрее.

Пару слов о других полезных опциях cURL

HTTP Аутентификация

Если на URL адресе есть HTTP аутентификация, то вы без труда можете воспользоваться следующим скриптом:

$url = "http://www.somesite.com/members/"; $ch = curl_init(); curl_setopt($ch, CURLOPT_URL, $url); curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); // указываем имя и пароль curl_setopt($ch, CURLOPT_USERPWD, "myusername:mypassword"); // если перенаправление разрешено curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1); // то сохраним наши данные в cURL curl_setopt($ch, CURLOPT_UNRESTRICTED_AUTH, 1); $output = curl_exec($ch); curl_close($ch);

FTP загрузка

В PHP также существует библиотека для работы с FTP, но вам ничего не мешает и тут воспользоваться средствами cURL:

// открываем файл $file = fopen("/path/to/file", "r"); // в url должно быть следующее содержание $url = "ftp://username:[email protected]:21/path/to/new/file"; $ch = curl_init(); curl_setopt($ch, CURLOPT_URL, $url); curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); curl_setopt($ch, CURLOPT_UPLOAD, 1); curl_setopt($ch, CURLOPT_INFILE, $fp); curl_setopt($ch, CURLOPT_INFILESIZE, filesize("/path/to/file")); // указывам ASCII мод curl_setopt($ch, CURLOPT_FTPASCII, 1); $output = curl_exec($ch); curl_close($ch);

Используем Прокси

Вы можете выполнить свой URL запрос через прокси:

$ch = curl_init(); curl_setopt($ch, CURLOPT_URL,"http://www.example.com"); curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); // указываем адрес curl_setopt($ch, CURLOPT_PROXY, "11.11.11.11:8080"); // если необходимо предоставить имя пользователя и пароль curl_setopt($ch, CURLOPT_PROXYUSERPWD,"user:pass"); $output = curl_exec($ch); curl_close ($ch);

Функции обратного вызова

Также существует возможность указать функцию, которая будет срабатывать ещё до завершения работы cURL запроса. Например, пока содержание ответа загружается, вы можете начать использовать данные, не дожидаясь полной загрузки.

$ch = curl_init(); curl_setopt($ch, CURLOPT_URL,"http://net.tutsplus.com"); curl_setopt($ch, CURLOPT_WRITEFUNCTION,"progress_function"); curl_exec($ch); curl_close ($ch); function progress_function($ch,$str) { echo $str; return strlen($str); }

Подобная функция ДОЛЖНА возвращать длину строки, что является обязательным требованием.

Заключение

Сегодня мы познакомились с тем, как можно применить библиотеку cURL в своих корыстных целях. Я надеюсь, что вам понравилась данная статья.

Спасибо! Удачного дня!